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ABSTRACT 


In  this  thesis,  the  principals  of  Software  Defined  Radio  are  demonstrated  by 
implementing  a  Binary  Frequency  Shift  Keying  (BFSK)  receiver-transmitter  in  a  Field 
Programmable  Gate  Array  (FPGA).  After  introducing  the  theory  behind  the  Non¬ 
coherent  BFSK  demodulation  implemented  at  the  receiver,  the  design  of  both  transmitter 
and  receiver  is  illustrated.  The  design  environment  of  choice  is  Mathworks’®  Simulink 
and  Xilinx®  System  Generator,  a  dedicated  library  for  Mathworks’  Simulink.  The  design 
is  downloaded  to  a  Virtex-4  FPGA. 

The  receiver  is  Non-Coherent  (NC)  in  the  sense  that  the  receiver  need  not  know 
the  phase  of  the  incoming  signal.  A  feedback  circuit  is  responsible  for  both  packet  and  bit 
synchronization.  Also,  the  receiver  is  implemented  using  non-coherent  match  filters 
instead  of  low  pass  filters  which  would  be  easier,  but  would  degrade  the  performance. 
Finally,  some  interesting  experiences  that  were  gained  during  the  learning  process  are 
discussed. 

In  Appendix  A,  we  evaluate  different  technological  options  in  implementing 
communication  modulating  techniques  and  Software  Defined  Radio.  These  options 
include  Digital  Signal  Processors,  Field  Programmable  Gate  Arrays,  General  Purpose 
Processors  and  Application  Specific  Integrated  Circuits  and  a  comparison  between  these 
choices  is  made. 
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EXECUTIVE  SUMMARY 


Software  Defined  Radio  (SDR)  is  a  new  and  faseinating  idea  having  its  roots  in 
the  early’ 90 ’s.  Teehnologie  eonstraints  prevented  this  idea  from  beeoming  a  reality  at  the 
beginning,  but  the  development  of  powerful  Field  Programmable  Gate  Arrays  (FPGAs) 
has  inereased  interest  in  the  SDR  eoneept.  FPGAs  eombine  versatility,  reeonfigurability 
and  upgradability  that  is  hard  to  find  in  any  other  deviee. 

A  simple  way  to  make  Software  Defined  Radio  a  reality  is  to  store  transeeiver 
designs  for  many  modulation  sehemes  in  memory  and  download  the  seleeted  one  to  an 
FPGA  as  needed.  This  goal  is  aeeomplished  when  transeeivers  for  all  modulation 
sehemes  and  serviees  of  ehoiee  are  designed  and  synthesized  for  the  target  FPGA. 
Starting  this  proeedure,  a  Binary  Frequeney  Shift  Keying  transmitter  and  reeeiver  design 
is  the  main  purpose  of  this  thesis. 

BFSK  is  the  modulation  that  uses  two  different  frequeneies  for  the  binary  0  and 
binary  1  symbols  of  the  input  stream.  This  modulation  is  simple  but  there  are  still  many 
ehallenges  for  the  timing  synehronization  of  the  reeeiver.  A  non-eoherent  reeeiver  was 
ehosen  to  eliminate  the  need  for  phase  synchronization.  The  description  of  such  a 
receiver  along  with  the  timing  issue  is  addressed  in  Chapter  II.  Given  that  Forward  Error 
Correction  is  used  in  the  transceiver  design,  an  introduction  of  convolutional  encoding  is 
also  given  in  Chapter  II. 

To  make  a  good  design,  the  proper  software  must  support  the  effort.  System 
Generator  is  a  program  available  by  Xilinx  to  help  the  designing  of  a  project,  offering  an 
environment  familiar  to  most  engineers,  namely  Mathworks’  Simulink  with  a  complete 
library  of  synthesizable  blocks.  This  program  is  supported  by  the  Integrated  Software 
Environment  (ISE)  Design  Suite,  which  is  the  Xilinx  software  that  accepts  the  code 
generated  by  System  Generator  and  continues  the  task  of  implementing  the  design  to  the 
EPGA  and  testing  the  resulting  downloaded  design.  A  more  complete  description  is 
included  in  Chapter  III. 


The  transmitter  and  receiver  design  made  under  System  Generator  is  presented  in 
Chapter  IV.  A  preamble  is  attached  before  each  packet  to  facilitate  the  synchronization 
of  the  receiver.  Before  that  happens,  the  message  bits  are  encoded  using  convolutional 
encoding.  Then  output  bits  of  these  procedures  are  transmitted  based  on  the  general  rule 
of  the  BFSK  modulation  scheme  where  binary  zeros  and  ones  correspond  to  two  different 
frequencies.  The  receiver  uses  non-coherent  matched  filters  to  extract  the  transmitted  bits 
from  the  received  waveform.  Also,  there  exists  a  timing  circuit  that  provides  the  bit  and 
packet  synchronization.  Finally,  the  preamble  is  stripped  off  and  the  remaining  bits  are 
inserted  to  a  Viterbi  decoder  that  yields  the  message  bits. 

The  verification  of  the  design  follows  in  Chapter  V.  This  is  carried  out  in  the 
System  Generator  environment,  by  examining  the  signal  at  different  points  in  the  design, 
and  in  using  Matlab  code  that  simulates  part  of  the  receiver.  The  results  are  shown  and 
the  design  can  be  considered  successful.  The  problems  that  were  encountered  during  the 
design  are  also  addressed  in  the  second  half  of  this  chapter. 

A  closer  look  at  current  FPGA  technology  is  included  in  Appendix  A.  Different 
technological  options  in  implementing  communication  modulating  techniques  and 
Software  Defined  Radio  are  discussed.  These  options  include  Digital  Signal  Processors 
(DSP),  Field  Programmable  Gate  Arrays,  General  Purpose  Processors  (GPP)  and 
Application  Specific  Integrated  Circuits  (ASICS)  and  a  comparison  between  these 
choices  is  made.  The  results  of  the  comparison  are  that  a  heterogeneous  design  that 
includes  all  three  of  a  DSP,  a  GPP  and  an  FPGA  can  provide  the  maximum  performance 
and  versatility.  DSPs  are  better  performing  in  sequential  logic,  whereas  FPGAs  are  more 
efficient  in  executing  parallel  tasks.  GPP  are  used  in  supporting  the  different  network 
protocols  and  other  similar  tasks. 

A  lower  level  description  compared  to  the  design  flow  of  Chapter  IV  of  each 
block  is  included  in  Appendix  B.  The  reason  for  many  choices  made  in  the  parameters 
window  of  every  block  is  mentioned  next  to  the  actual  value  of  the  parameter.  In  this 
way,  the  rebuilding  of  the  design  can  be  made  solely  based  on  this  appendix.  In  the  same 


XIV 


time,  further  insight  into  the  dependenee  of  the  desired  results  upon  the  ehosen 
parameters  is  provided.  In  Appendix  C,  the  Matlab  code  that  helped  the  verification  of 
the  receiver  design  is  included. 
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I.  INTRODUCTION 


A.  BACKGROUND  ON  SOFTWARE-DEFINED  RADIO 

A  software  radio  is  a  radio  which  uses  programmable  hardware.  Software  is  used 
to  eonfigure  the  hardware  to  meet  different  eommunieation  seheme  speeifieations  as  well 
as  to  support  several  different  serviees.  Aeeording  to  J.  Mitola  (1993)  “a  software  radio 
(SR)  is  a  set  of  Digital  Signal  Proeessing  (DSP)  primitives,  a  metalevel  system  for 
eombining  the  primitives  into  eommunieations  systems  funetions  (transmitter,  ehannel 
model,  reeeiver  .  .  .)  and  a  set  of  target  proeessors  on  whieh  the  software  radio  is  hosted 
for  real-time  eommunieations  [1].”  This  eoneept  is  in  eontrast  to  eommon  radio  deviees 
implemented  in  speeifie  hardware,  whieh  provide  a  limited  eapability  of  switehing 
between  modulation  sehemes  and  serviees,  mainly  due  to  the  statie  hardware  used.  An 
ideal  SR  reeeiver  direetly  samples  the  antenna  output.  A  software-defined  radio  (SDR)  is 
a  praetieal  version  of  an  SR.  The  reeeived  signals  are  sampled  after  a  suitable  band 
seleetion  fdter  and  frequeney  down  eonversion  [2]. 

The  flexibility  and  reeonfigurability  demonstrated  by  the  SDR  have  beeome  a 
reality  largely  due  to  the  evolution  of  digital  eleetronies  proeesses  defined  in  software 
instead  of  using  statie  and  applieation  speeifie  integrated  eireuits  sueh  as  mixers,  fdters, 
amplifiers,  modulators,  demodulators,  and  deteetors. 

The  eoneept  of  SDR  has  progressed  further  beeause  of  the  advaneement  of  Field 
Programmable  Gate  Arrays  (FPGAs)  and  is  eurrently  a  field  of  intensive  researeh,  even 
though  the  FPGAs  are  not  the  only  platform  upon  whieh  SDR  ean  be  based.  General 
Purpose  Proeessors  (GPPs)  and  dedieated  Digital  Signal  Proeessing  (DSP)  ehips  provide 
an  alternative  to  FPGAs,  having  their  own  pros  and  eons.  Nevertheless,  the  versatility 
that  FPGAs  demonstrate  makes  them  unique  in  many  aspeets. 

B,  GOALS  OF  RESEARCH  AND  CONCEPTS 

Reeent  teehnologieal  advaneements  have  allowed  FPGAs  to  transform  from  an 
auxiliary  deviee  to  a  signal  proeessing  engine.  Nowadays,  not  only  ean  FPGAs  eompete 
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with  dedicated  circuits,  but  also  they  give  life  to  sectors  of  science  that  need  their 
versatility.  They  have  enhanced  the  Software  Defined  Radio  concept,  which  is  a  great 
advancement  versus  the  normal  Radio  concept. 

The  main  goal  of  this  research  is  the  design  of  a  Binary  Frequency  Shift  Keying 
(BFSK)  transmitter  and  receiver.  The  BFSK  modulation  is  used  for  the  illustration  of  the 
techniques  in  designing  a  communication  system  in  FPGAs.  The  reason  is  that  BFSK  is  a 
simple,  but  robust  modulation  that  can  be  received  non-coherently.  The  design  process 
also  helps  acquiring  a  greater  experience  in  the  design  of  FPGAs  using  some  of  the  easier 
to  use  but  powerful  schematic,  synthesis  and  place  and  route  tools  available  today. 

The  second  goal  of  the  research  is  to  track  the  advancements  made  in  the  field  of 
FPGAs  and  inform  on  the  usefulness  and  possible  implementations  of  FPGAs. 

C.  METHODOLOGY  AND  SCOPE  OF  THE  RESEARCH 

Xilinx’s  System  Generation  10.1  SP2  is  the  schematic  tool  used  to  design  a  BFSK 
transceiver.  After  verifying  that  the  design  worked  correctly,  the  code  of  the  design  was 
automatically  generated  by  System  Generator  and  the  code  was  loaded  into  the  Integrated 
Software  Environment  (ISE™)  to  be  synthesized,  placed  and  routed,  and  finally 
downloaded  to  the  target  EPGA,  which  is  a  Xilinx’s  Virtex-4.  Nevertheless,  the 
verification  of  the  implementation  on  the  chip  was  not  done  due  to  time  constraints. 

The  main  challenge  to  the  design  is  to  achieve  the  synchronization  required  in 
order  for  the  receiver  to  be  able  to  distinguish  the  beginning  and  end  of  different  packets 
of  incoming  data.  The  length  of  the  packet  was  chosen  to  be  fixed  at  128  bits  and  the  first 
8  bits  compose  the  preamble  that  facilitates  the  bit  synchronization  and  packet  detection. 

D,  BENEFITS  OF  THE  RESEARCH 

The  concept  of  Software  Defined  Radio  is  fascinating  but  complex.  Designing 
different  modulation  schemes  that  can  be  downloaded  to  an  EPGA  is  an  easy  way  to 
design  a  simple  Software  Defined  Radio.  On  the  other  hand,  all  digital  modulations  share 
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the  same  basie  prineiples;  thus,  synehronization  techniques  from  one  modulation  can  be 
borrowed  and  modified  to  work  with  another  modulation  scheme.  A  fully  working  digital 
BFSK  transceiver  is  simulated  in  this  thesis. 

The  research  made  regarding  FPGAs  unveiled  the  fact  that  while  technology  is 
changing,  some  arguments,  like  power  consumption,  that  were  once  against  the  use  of 
FPGAs,  may  be  today  their  strong  point.  The  system  designer  must  always  be  up-to-date 
and  adaptive  regarding  new  technologies  since  FPGAs  are  going  to  be  used  more 
extensively  in  the  future  [3]. 

E.  ORGANIZATION  OF  THE  THESIS 

Chapter  II  includes  background  regarding  Binary  Frequency  Shift  Keying.  A 
Non-Coherent  BFSK  receiver  is  presented  in  order  to  facilitate  the  understanding  of  the 
design  that  was  implemented  in  an  FPGA.  Also,  the  concept  of  convolutional  encoding  is 
introduced. 

Chapter  III  contains  the  description  of  the  design  environment  used,  namely 
Xilinx’s  System  Generator,  ISE  and  ChipScope  Pro  along  with  the  characteristics  of  the 
board  used  for  the  design.  The  high  level  of  maturity  and  the  friendly  interface  of  the 
software  product  played  a  key  role  in  the  successful  completion  of  the  whole  project. 

Chapter  IV  gives  a  detailed  description  of  the  software  design  of  a  BFSK 
transmitter  and  receiver.  The  description  includes  the  logic  for  the  design  choices  that 
were  made,  the  reason  behind  the  choice  of  specific  components,  and  the  explanation  of 
the  function  of  many  blocks. 

Chapter  V  discusses  the  results  taken  by  simulation  in  the  design  environment. 
Input  and  output  are  compared  using  Matlab  and  the  correctness  of  the  results  is 
discussed. 

Chapter  VI  includes  an  outline  of  the  work  made,  the  significant  results  taken,  the 
limitations  of  the  design,  and  recommendations  for  future  work. 
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In  Appendix  A,  an  extensive  baekground  regarding  FPGAs  is  given,  explaining 
that  they  are  well  suited  for  Software  Defined  Radios.  FPGA’s  positive  and  negative 
aspeets  are  mentioned  and  are  eompared  with  General  Purpose  Proeessors,  Digital  Signal 
Proeessors  and  Applieation  Speeifie  Integrated  Cireuits. 

In  Appendix  B,  a  detailed  deseription  of  the  design  is  given  in  a  per  figure  and  per 
bloek  basis.  Reading  Appendix  B  in  parallel  with  Chapter  IV  provides  a  better 
understanding  of  the  bloeks  and  the  reason  they  were  used. 

In  Appendix  C,  the  Matlab  eode  used  to  verify  the  results  taken  by  System 
Generator  is  shown. 

In  this  ehapter,  the  eoneept  of  Software  Defined  Radio  was  introdueed.  The  idea 
of  SDR  is  realized  by  building  a  BFSK  transeeiver  using  an  FPGA.  In  order  to  provide  a 
solid  baekground  to  faeilitate  understanding  the  design,  the  next  ehapter  diseusses  BFSK 
modulation  and  demodulation  and  eonvolutional  eneoding. 
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II.  BINARY  FREQUENCY  SHIFT  KEYING  MODULATION 
SCHEME  AND  CONVOLUTIONAL  ENCODING 


BFSK  is  a  basic  digital  modulation  scheme.  Its  eoneept  is  not  presented  in  depth, 
but  ean  be  found  in  any  introduetory  textbook  eoneerning  eommunieations.  The  textbook 
used  as  a  referenee  for  this  brief  introduetion  is  [4,  p.  198]  along  with  [5],  whieh  both 
inelude  a  detailed  deseription  of  the  BFSK  modulation  seheme  and  a  BFSK  reeeiver.  An 
introduetion  to  eonvolutional  eneoding  is  also  given  at  the  end  of  this  ehapter. 

A.  BFSK  MODULATION 

In  BFSK,  two  distinet  frequeneies  are  ehosen  to  represent  the  two  possible  values 
of  a  bit.  The  equation  that  deseribes  the  transmission  signal  s  of  the  i’'  bit  that  is 
produeed  by  this  modulation  teehnique  is  the  following  [5]: 

r-  r  f  ~ 

s{t)  =  ^J  2  A^,  cos  2n  /^+h(t) - ^  +  6*.  ,  for /T],  >  t  >  (/-1)7],  (2.1) 

L  V  2  J 

where  7),  is  the  bit  duration,  Ac  is  the  earrier’s  amplitude,  fc  is  the  mean  signaling 
frequency  in  Hz,  b(t)  is  the  value  of  the  transmitted  bit  in  bipolar  form  where  1 
eorresponds  to  bit  1  and  -1  eorresponds  to  bit  0,  A.f  is  the  frequeney  separation  of  the 

two  frequeneies,  and  9i  is  the  bit  phase. 

A  BFSK  reeeiver  is  distinguished  by  eoherent  or  non-eoherent  depending  if  the 
knowledge  of  the  phase  information  of  the  reeeived  signal  is  prerequisite  for  the  reeeiver 
to  work  properly.  In  this  thesis,  the  reeeiver  of  choice  is  non-eoherent  whieh  deereases 
the  eomplexity  of  the  reeeiver  eireuit,  eliminating  the  need  for  an  extra  eireuit  that  would 
aequire  the  phase  information.  The  eonfiguration  that  allows  the  realization  of  a  Non¬ 
coherent  (NC)  reeeption  is  the  energy  deteetor.  A  diagram  of  a  NC  BFSK  reeeiver  is 
shown  in  Figure  1  [5]. 
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Figure  1  Block  diagram  of  a  NCBFSK  receiver  (From;  [5]). 


The  received  signal  is  distributed  in  two  distinct  paths,  one  for  each  frequency. 
To  each  path,  the  signal  is  further  divided  among  two  branches;  one  branch  is  configured 
to  detect  the  in-phase  (I)  signal  and  the  other  branch  the  quadrature  (Q)  signal  of  the 
respective  frequency.  Each  branch  consists  of  a  mixer,  an  integrator  and  the  squaring 
function.  Both  branches  and  the  summer  at  their  end  consist  of  a  non-coherent  matched 
filter.  The  term  non-coherent  matched  filter  means  that  this  filter  does  not  try  to  match 
the  carrier  phase,  but  only  the  envelope  of  the  signal  [5,  pp.  256-258]. 


The  structure  is  self  similar,  thus,  the  analysis  made  for  the  case  of  bit  ‘1’ 
transmitted  is  exactly  inverse  to  the  case  of  bit  ‘0’  transmitted.  For  a  bit  ‘1’  transmitted, 
the  input  to  the  integrator  of  the  top  path  is  given  by  [5]; 


Tj  (t)  =  2^(0  cos 


(  A® 

®c+  — 

V 


=  2^24 


cos 


f  Aco^ 

(  Acoi^ 

t  +  0 

•cos 

t 

LI  2  y 

.1  2  y 

=  V24|  cos  (6’.)  + cos 


r 


Ao) 


0)+- 
V  2 


t  +  6, 


(2.2) 


Similarly,  the  input  to  the  other  integrator  in  the  top  non-coherent  matched  filter  is 
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If  the  right  conditions  are  met,  the  above  expressions  are  simplified.  Thus,  when 
is  chosen  to  be  an  integer  multiple  of  half  the  bit  rate  and  Af  is  chosen  to  be  an 

integer  multiple  of  the  bit  rate  ,  where  R^=  — ,  only  the  first  terms  of  the  above 

'^b 

expressions  are  non-zero.  These  conditions  are  known  as  orthogonal  signaling  [3,  pp. 
200-204].  Following  that  restriction,  the  outputs  of  the  integrators  of  the  top  NCMF  are: 

X^(7;)  =  ^/24•cos^.  (2.6) 

and 

X,{T^)=  -42A-sme..  (2.7) 

The  outputs  of  the  squaring  block  are 

V,Xr,)  =  2A/ -008^0,  (2.8) 

and 

V,^(r,)  =  2A/-sin^0,.  (2.9) 

Summing  the  outputs  of  the  two  branches  yields 
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Fj  (t;  )  =  24'  (sin'  e. + cos'  e. )  =  24'  (2. 1  o) 

as  the  output  of  the  top  path.  The  output  of  the  I-channel  mixer  in  the  bottom  NCMF  in 
Figure  1  is 


Tj  (t)  =  25(t)cos 


CO, - 

t 

LI  2  ; 

=  2V24 


cos 


(  A(o^ 

(  A(o^ 

CO,  -1 - 

t  +  0- 

•cos 

CO, - 

t 

2  ; 

\ 

1  2  ; 

=V2/(  r  cos  (a®  -1  +  6^)  +  cos  (2(y^t  +  )]  • 

The  output  of  the  Q-ehannel  mixer  in  the  bottom  NCMF  in  Figure  1  is 

A(y 


C,(0  =25(0sin 


(»„  -- 


(  A®^ 

co,^ - 

t+0, 

•sin 

(  Ao)'^ 

CO, - 

t 

LI  2 ; 

1 

LI  2  ; 

=  2V24  cos  1 

=  ^/24  [-sin  ( A(y  •  t  +  6*, )  +  sin  {2a)^t  +  ^, )]  • 
The  outputs  of  the  integrators  of  the  bottom  NCMF  are 


XAT,) 


V244 


2n 


for  the  I-ehannel,  and 


In 


^^;^sin  (2M/  •  7; )  +  ^^[cos  (2M/  •  T)  - 1]  + 
^ {cos e,  ■  sin (4;z-X7; )  +  sin 0^  [cos[4nfJ, ) - 1]} 

J  c 


-  sin  (2M/ •  7; )  + [cos  ( 2M/ •  r)  - 1]  + 

^  {sin  6*.  •  sin  (4;z-X7; )  -  cos  6,  [cos  (4;z-X7; )  - 1]} 

J  c 


(2.11) 


(2.12) 


(2.13) 


(2.14) 


for  the  Q-ehannel. 


If  orthogonal  signaling  is  ehosen,  i.e.,  A/"  =  and  =  nowhere  n  and  m 

are  integers,  the  outputs  of  the  integrators  in  the  bottom  NCMF  simplify  to  Xj  (T^)  =  0 
and  (T^)  =  0  .  This  in  turn  yields 


8 


(2.15) 


V^iT,)  =  0 

and  using  equation  (2.10)  and  (2.15)  the  output  of  the  subtraetion  of  the  paths  is 

V,-V^=1A^\  (2.16) 

For  the  ease  that  bit  ‘0’  is  transmitted,  the  whole  proeess  is  inverted  and  the 
respective  outputs  of  the  two  paths  would  be  t)j(7),)  =  0,  and 

6^+cos^  6.)  =  2A^ .  Hence,  the  output  of  the  subtraction  of  the  two 
paths  is  now  -2H^^ .  Sampling  the  final  output  at  the  end  of  the  duration  of  each  bit 
reveals  the  value  of  the  transmitted  bit. 

It  is  obvious  that  this  implementation  relies  heavily  on  proper  bit  synchronization, 
which  means  that  the  receiver  should  know  the  exact  duration  of  each  bit  and  when  each 
bit  ends.  To  acquire  this  information  an  extra  circuit  is  needed  and  when  the  timing 
information  is  incorrect,  severe  degradation  of  the  performance  of  the  receiver  may 
result.  Many  Time  Error  Detectors  (TEDs)  for  discrete  time  implementations  are 
presented  in  [6],  including  the  Early-Eate  TED,  the  Zero  Crossing  TED,  and  the  Gardner 
TED. 

In  summary,  the  energy  of  the  two  branches  of  each  path  are  added  and  compared 
to  the  energy  of  the  other  path.  The  decision  made  about  the  received  bit  is  in  favor  of  the 
bit  that  corresponds  to  the  frequency  of  the  path  with  the  highest  energy.  In  order  to 
minimize  the  cross  product  of  energies,  the  frequencies  used  must  be  orthogonal  which 
implies  a  tone  spacing  that  is  a  multiple  of  the  bit  rate  and  a  center  frequency  that  is  a 
multiple  of  half  the  bit  rate  [5]. 

B,  CONVOLUTIONAL  ENCODING 

Encoding  in  digital  communications  is  used  for  forward  error  correction.  The 
convolutional  codes  are  one  of  the  two  most  commonly  used  along  with  block  codes. 
They  were  introduced  in  1955  by  Elias  [7]. 

k 

Convolutional  codes  are  characterized  by  the  code  rate  r  =  — ,  where  k  is  the 

n 

length  of  the  input  word  and  n  is  the  length  of  the  output  word,  and  by  the  memory  order 
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m  .  The  memory  order  m  is  the  number  of  memory  elements  that  are  included  in  the 
encoder  and  is  a  crucial  parameter  of  the  performance  of  a  code.  Each  code  can  be 
uniquely  described  by  a  matrix  with  octal  numbers  as  elements.  The  number  of  columns 
in  this  matrix  corresponds  to  the  n  parameter  and  the  number  of  rows  to  the  k 
parameter.  The  actual  value  of  the  octal  number  reveals  the  interconnections  that  yield 
the  respective  output,  counting  in  binary  from  right  to  left.  In  the  example  taken  from  [8], 

in  Figure  2,  we  can  identify  an  r  =  ^  code  with  a  convolutional  code  array  of  [133,171]. 

The  number  133  is  the  octal  equivalent  of  binary  1011011  and  corresponds  to  output  Cj 
and  171  is  the  octal  equivalent  of  binary  1111001  and  corresponds  to  output  Q.  This 
specific  code  is  an  industry  standard  code  for  m  =  6.  The  constraint  length  k  for  the  case 

of  k  =  \  is  /f  =  m  + 1 .  In  this  thesis,  the  industry  standard  convolutional  code  for  ^  ~ 

and  /f  =  3 ,  namely  [7  5],  is  used. 


C1 


CO 


Figure  2  Convolutional  Encoder  Block  Diagram  of  code  rate  ^  -  ~  k  =  1  . 

Convolutional  encoded  streams  are  usually  decoded  by  Viterbi  decoders,  invented 
by  Viterbi  [9].  Viterbi  decoders  implement  maximum  likelihood  decoding  with  a  slight 
performance  penalty  due  to  finite  decoder  memory  [10]. 
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This  chapter  has  explained  the  fundamental  principles  required  to  understand  the 
NCBFSK  transmitter  and  receiver  design  detailed  in  the  remainder  of  this  document.  The 
next  chapter  describes  the  software  and  hardware  design  tools  used  in  this  design  effort. 
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III.  DESIGN  ENVIRONMENT 


Xilinx  offers  a  full  suite  of  programs  that  provides  an  integrated  development 
environment  for  its  FPGAs.  This  suite  is  named  Integrated  Software  Environment  (ISE) 
Design  Suite  and  the  main  programs  that  are  included  are  System  Generator  for  DSP,  ISE 
Project  Navigator,  ChipScope  Pro  Tool,  PlanAhead  and  AccelDSP  Synthesis  Tool  [11]. 
Not  all  of  these  tools  were  used  because  each  program  has  a  very  specific  functionality, 
some  of  which  were  not  needed.  System  Generator  was  used  as  the  main  design  entry  and 
simulation  program  and  ISE  Project  Navigator  as  the  program  that  implements  the  design 
into  the  targeted  Xilinx  device. 

A.  SYST  EM  GENERATOR 

System  Generator  is  a  EPGA  design  program  that  offers  the  necessary  libraries  of 
blocksets,  making  use  of  the  Mathworks’  Simulink  design  environment.  Simulink  is  a 
schematic  tool  that  is  part  of  Matlab  and  is  known  for  its  efficiency  and  ease  of  use 
among  engineers.  Eor  this  reason.  System  Generator  (Sysgen)  chose  this  environment  to 
offer  the  system  modeling,  making  available  the  mixing  of  components  from  Simulink 
and  Sysgen  for  simulation  purposes  (Eigure  3).  Sysgen  also  provides  automatic  code 
generation  that  can  be  then  downloaded  to  Xilinx’ s  EPGAs.  The  Hardware  Description 
Eanguage  (HDE)  that  is  used  during  code  generation  can  be  chosen  from  the  Sysgen 
token  and  is  either  VHDE  or  Verilog  [12]. 

The  blocks  offered  by  Sysgen  are  guaranteed  to  be  synthesizable,  solving  a  great 
problem  for  the  designer.  Blocks  are  schematic  components  that  implement  primitive 
functions  and  offer  the  option  of  default  along  with  customizable  inputs  and  outputs  that 
can  be  interconnected.  More  complex  blocks  exist  as  well,  yielding  the  opportunity  to 
construct  a  complex  design  without  much  effort.  A  full  list  and  description  of  all  the 
available  blocks  is  included  in  [13]  and  a  more  technical  description  of  the  Intellectual 
Properties  blocks  is  included  in  [14].  Most  of  these  blocks  are  DSP  related  and  only  a  few 
are  dedicated  to  communications.  In  the  later  case,  an  extra  license  is  usually  needed  in 
order  for  them  to  integrate  into  the  design.  Their  color  is  green  by  default  and  is  clearly 
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shown  in  Figure  3.  The  block  ‘System  Generator’  is  mandatory  to  every  design  and  the 
blocks  ‘Gateway  In’  and  ‘Gateway  Out’  define  the  limits  of  the  design  that  are  going  to 
be  translated  in  an  FPGA  circuit.  The  current  version  of  Sysgen  is  10.1  with  Service  Pack 

2. 


Figure  3  illustrates  a  very  simple  example,  where  a  Finite  Impulse  Response 
(FIR)  Filter  is  designed.  The  input  is  supplied  by  Matlab  and  the  output  is  viewed  by 
double  clicking  on  the  ‘Scope.’  The  parameters  of  the  single  Xilinx  block  used  are 
defined  in  the  respective  window  that  appears  when  the  FIR  block  is  selected.  Neither  of 
the  Simulink  blocks,  ‘From  Workspace’  and  ‘Scope,’  are  synthesizable.  They  are  only 
used  during  the  design  phase  for  simulation  purposes. 


Other  parameters  that  are  common  to  many  Sysgen  blocks  are  the  format  and 
width  of  the  output  values  [13,  p.  44].  There  are  blocks  dedicated  to  manipulate  the  data 
type  and  alter  their  internal  structure.  For  example,  the  Enable  and  Reset  signal  are  only 
allowed  to  be  Boolean,  thus  an  unsigned  one  bit  integer  must  be  reinterpreted  as  a 
Boolean  number.  This  is  accomplished  by  the  blocks  ‘Reinterpret’  or  ‘Convert.’ 


^  OPC  Toolbox 
I  B  RFBIockset 
B  Real-Time  Windows  Target 
)  B  Real-Time  Workshop 
1  B  Real-Time  Workshop  Embedded  I 
B  Report  Generator 
]  B.  Robust  Control  Toolbox 
)  B  Signal  Processing  Blockset 
)  B  SimEvents 
)  B  SimPowerSystems 
]'  B  Simscape 
I  B.  Simulink  Control  Design 
B  Simulink  Design  Verifier 
1  B.  Simulink  Extras 
)  B  Simulink  Parameter  Estimation 
I  B  Simulink  Response  Optimization 
B  Simulink  Verification  and  Validatic 
B  Statdiow 

1  B  System  Identification  Toolbox 
I  B  Target  for  Freescale  MPCSxx 
I  B  Targetforinfineon  C166 
1  B.  TargetforTIC2000 
)  B  Target  for  TtCeOOO 
I  B  Video  and  Image  Processing  BlocI 
I  B  Virtual  Reality  Toolbox 
]  B  Xilinx  Blockset 

ib-l  Basic  Elements 
i}-l  Communication 
Control  Logic 
j>-l  Data  Types 
^  DSP 
^  Index 
^  Math 
^  Memory 
:b-l  Shared  Memory 
ibl  Tools 

I  B  Xilinx  Reference  Blockset 


File  Edit  View  Simulation  Format  Tools  Help 

□  -  fio!o  [notoT 


Figure  3  Example  of  the  environment  and  the  blocks  offered  by  Sysgen. 
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Another  block  of  special  use  is  ‘MCode’  [13,  p.  239].  It  allows  writing  a  program 
in  Matlab  and  saving  it  in  the  block.  Then,  Sysgen  is  responsible  for  synthesizing  this 
program.  There  are  many  constraints  regarding  the  commands  that  can  be  used  in  such  a 
program.  As  an  example,  the  division  by  a  number  different  from  a  power  of  two  is  not 
supported.  Nevertheless,  this  block  is  very  useful  to  describe  state  machines,  and  as  such, 
it  has  been  used  many  times  in  the  BFSK  design. 

B,  ISE  PROJECT  MANAGER 

After  finishing  with  the  design  and  generating  the  code  for  the  HDL  language  of 
choice  via  Sysgen,  the  source  file  is  loaded  into  the  ISE  Project  Manager  as  a  project. 
This  Manager  is  responsible  for  the  synthesis,  implementation,  and  verification  of  the 
design  and  the  target  device  configuration  [15]. 

After  loading  a  project  created  by  Sysgen,  source  files  can  be  added,  created  or 
modified.  Other  available  processes  under  the  Processes  Window,  as  shown  in  the  left 
column  in  Figure  4,  are  as  follows; 

•  Add  timing  constraints  or  define  Input  Output  (10)  pins  under  User 
Constraints  choice. 

•  Synthesize  the  project  or  generate  post-synthesis  simulation  under 
Synthesize  -XST.  At  this  step  HDL  programs  are  converted  to  netlist  files 
that  are  used  by  the  implementation  step. 

•  Translate  the  logical  design  (netlist  file)  to  a  physical  file  format,  to  make 
the  mapping  of  the  design  to  the  FPGA,  and  to  place  and  route  the 
mapping  to  the  FPGA  of  choice  under  Implement  Design  choice.  The 
placement  step  includes  the  decision  made  by  the  program  regarding 
where  to  place  the  logic  elements  given  the  internal  structure  of  the  target 
device.  Then,  routing  is  responsible  for  finding  the  optimized  connecting 
paths  between  these  placed  components. 

•  Generate  the  programming  file  that  will  be  installed  into  the  FPGA  under 
Generate  Programming  File, 

•  Configure  Target  Device,  and 

•  Use  the  ChipScope  Pro  program  to  verify  the  actual  implementation  into 
the  FPGA  under  Analyze  Design  Using  Chipscope.  Every  step  of  the 
implementation  process  described  above  has  its  own  tools  for  testing  and 
simulating  the  design.  ChipScope  is  responsible  to  check  the  functionality 
of  the  final  design  installed  into  the  FPGA. 
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Figure  4  Project  Navigator  Main  Window. 

C.  AVNET  BOARD 

The  mainboard  to  be  used  for  the  project  is  designed  by  AVNET  and  is  called  the 
Xilinx®  Virtex™-4  LX  LC  Development  Kit  interconnected  with  the  Analog  to  Digital 
(A/D)  and  Digital  to  Analog  (D/A)  Converter  PI 60  provided  by  Avnet  as  well. 

The  mainboard’s  key  features  are  the  Virtex  XC4VLX25  FPGA,  10/100  Ethernet 
interface  and  64  MB  Double  Date  Rate  (DDR)  Synchronous  Dynamic  Random  Access 
Memory  (SDRAM).  The  Virtex  XC4VEX25  is  a  low  entry  EPGA  of  the  Virtex-4  family 
and  contains  24,192  logic  cells  and  48  dedicated  DSP  cells  called  XtremeDSP  (18-bits  x 
18-bits,  two’s  complement,  signed  Multiplier).  It  is  manufactured  using  the  90nm  Copper 
CMOS  Process  and  it  has  no  possibility  of  using  the  embedded  soft  processor  PowerPC 
405  core,  due  to  size  constraints  [16].  The  Analog  Module  P160  features  two  12-bit  53 
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Msps  A/D  converters  and  two  12-bit  165  Msps  D/A  converters  yielding  much  flexibility 
for  the  design  [17].  Nevertheless,  this  module  has  not  been  used  in  any  test  in  this 
researeh,  mainly  due  to  time  eonstraints.  The  deseription  of  its  pins  and  interfaees  is  in 
[18]. 

The  literature  reeommended  for  the  Sysgen  and  ISE  is  limited  to  the  Xilinx 
Manuals.  These  manuals  are  included  in  a  help  guide  offered  by  Xilinx  as  an  internet- 
aeeessible  Aerobat  file  [19].  For  System  generator  there  is  also  a  manual  that  ineludes 
introductory  labs  and  block  and  program  reference  manuals  in  its  support  page  under  the 
documentation  tab  and  the  Design  Tool  choiee  [20].  Extensive  documentation  of  the  most 
eomplex  blocks  is  given  in  the  same  page  under  the  IP  Cores  ehoice  [21].  For  the  ISE 
project  manager  the  documentation  can  be  reached  through  the  help  guide  stated  above 
after  ehoosing  TSE  Help’  [15]. 

Sysgen  and  ISE  Project  Manager  were  extensively  used  for  the  design  and  the 
generation  of  the  programming  file  of  the  non-coherent  Binary  Frequency  Shift  Keying 
Transmitter-Reeeiver  presented  in  the  next  ehapter.  The  plethora  of  tools  offered  by  these 
programs  made  the  design  straight-forward,  compared  to  writing  directly  to  an  HDE 
language.  Xilinx  is  also  supporting  its  programs  online,  making  the  troubleshooting 
easier. 
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IV.  DESIGN  FLOW 


In  this  chapter,  the  logic  flow  of  the  design  is  discussed  in  detail.  The  basic 
principles  of  the  BFSK  transmitter  and  receiver  illustrated  in  Chapter  II  are  implemented 
in  Simulink  using  Xilinx’s  blocks.  The  transmitter  and  receiver  are  separated  into  two 
different  designs.  The  design  is  further  exemplified  in  a  per  figure  and  per  block  basis  in 
the  Appendix  B,  where  key  parameters  and  Matlab  code,  where  applicable,  are  also 
given. 

A.  TRANSMITTER 

The  transmitter,  illustrated  in  Figure  5,  is  the  combination  of  three  distinct  parts; 
the  preamble,  the  data  input  and  the  modulation  circuitry.  The  data  is  transmitted  in 
blocks  of  120  bits.  An  eight  bit  preamble  with  pattern  10101001  is  attached  in  front  of 
every  packet  to  facilitate  packet  synchronization  at  the  receiver.  For  simulation  purposes, 
Simulink’s  blocks  ‘From  Workspace’  and  ‘To  Workspace’  were  used  to  supply  the 
design  with  input  bits  and  store  them,  respectively.  The  results  were  also  visually  verified 
at  each  stage  using  ‘Scope’  blocks. 

preamble  with  sequence 


10101001  .  Every  128  channel  *  _ 

bits  repeat  the  sequence 


Figure  5  Transmitter’s  schematic  diagram  designed  in  Simulink/Sysgen  environment. 
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1,  Preamble  Subsystem 

The  Preamble  Subsystem  (Figure  6)  is  responsible  for  the  attaehment  of  the 
preamble  at  the  start  of  eaeh  paeket.  This  subsystem  is  also  responsible  for  the  bloeking 
of  data  bits  whenever  the  preamble  is  transmitted  and  eontrolling  the  multiplexer  ‘Mux  1  ’ 
in  Figure  5,  whieh  seleets  the  data  or  the  preamble. 

The  eounter  drives  two  bloeks.  It  eounts  up  to  127  and  restarts  from  0.  While  the 
eounters  output  is  seven  or  less,  the  preamble  is  valid  and  is  read  out  to  the  modulation 
subsystem  via  the  multiplexer  ‘Muxl’  in  Figure  5.  The  ‘read_out’  and  ‘preamble_invalid’ 
are  low  and  the  output  of  the  eounter  is  direetly  translated  to  an  address  in  the  ‘ROM’ 
bloek.  The  eontent  of  this  address  appears  at  ‘ROM’  output  and  again  through  the 
‘Muxl’  in  Figure  5  to  the  Modulation  Subsystem.  ‘Muxl’  is  switched  in  the  correct 
position  by  ‘preamble_invalid’  signal.  ‘Read_out’  is  responsible  to  block  the  message 
bits  and  let  them  be  stored  in  a  memory  while  the  preamble  is  transmitted.  The  signals 
‘preamble  invalid’  and  ‘read  out’  take  the  same  values  and  have  different  names  merely 
for  illustration  purposes. 


Note  :From  each  info  bit  , 
i  get  2  channei  bits  (due 
to  convoiutionai  code  1/2). 
That  is  why  the  preamble  's 
clock  is  set  at  twice  the  speed  . 


storage  of  the 
preamble 
sequence 


reset 


release  the  transmittion  of  info  bits  read  out 


Figure  6  Preamble  Subsystem. 
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2. 


Data  Input  Subsystem 


The  Data  Input  Subsystem  (Figure  7)  is  responsible  for  the  eonvolutional 

eneoding  of  the  input  sequenee  with  a  rate  ^  “  code  and  the  subsequent  storage  of  the 

eneoded  bit  in  a  First  In  First  Out  (FIFO)  memory.  The  two  streams  ereated  by  the 
‘Convolutional  Encoder’  block  merge  back  into  one  stream  by  the  ‘Concat’  and  ‘Parallel 
to  Serial’  blocks.  It  should  be  noted  that  these  two  last  blocks  can  be  replaced  by  a  ‘Time 
Division  Multiplexer’  block.  The  bit  period  of  the  final  stream  is  half  the  period  of  the 

message  bits  due  to  the  encoding  with  rate  the  ‘Convolutional  Encoder’ 

parameters  window,  the  constraint  length  was  set  to  3,  meaning  that  the  encoder  is  using 
a  register  of  two  flip-flops.  The  encoding  vector  of  choice  was  [7  5],  as  explained  in 
Section  B  in  Chapter  II. 


FIFO 


Eigure  7  Data  Input  Subsystem. 

After  being  stored  in  the  ‘EIEO’  memory,  the  data  waits  for  the  enable  signal  of 
the  Preamble  Subsystem  in  order  to  exit.  At  the  same  time,  the  multiplexer  ‘Muxl’  in 
Eigure  5  is  switched  to  the  correct  position  to  allow  the  promulgation  of  the  input  data  to 
the  last  subsystem.  Each  bit  produces  an  ESK  symbol  of  duration  64  samples  in  the 
Modulation  Subsystem.  This  parameter  can  generally  be  adjusted  from  the  panel  of  the 
blocks  under  the  title  ‘Explicit  sample  period.’ 


21 


3,  Modulation  Subsystem 

The  modulation  subsystem,  illustrated  in  Figure  8,  uses  eaeh  bit  that  appears  at  its 
entranee  to  ehoose  between  the  two  frequeneies.  This  is  aoeomplished  by  a  multiplexer 
‘Mux,’  where  the  seleetion  pin  (sel)  is  driven  by  the  forwarded  bits  and  the  multiplexer 
data  inputs  are  driven  by  two  Direet  Digital  Synthesizers  (DDSs).  Each  DDS  generates  a 
sine  wave  at  one  of  the  two  frequencies  for  the  BFSK  signal.  The  DDS  is  a  digital 
sinusoid  generator  and  can  produce  frequencies  up  to  half  the  frequency  at  which  the 
DDS  core  will  be  clocked,  i.e.,  the  DDS  clock  rate,  in  order  not  to  exceed  the  Nyquist 
frequency  [22],  For  the  Xilinx  Virtex-4,  which  can  achieve  clock  speeds  of  500  Mhz 
[23],  the  limit  for  the  output  frequency  of  the  DDS  is  250  Mhz  when  the  DDS  clock  rate 
is  set  to  the  maximum  possible  frequency.  Nevertheless,  much  lower  frequencies  were 
used  and  the  frequencies  for  1  and  0  are  45  MHz  and  40  MHz,  respectively.  Given  that 
the  encoded  bit  rate  of  choice  is  R  =  \. 5625Mbps  ,  the  two  frequencies  are  not  orthogonal 
based  on  the  definition  given  in  Section  A  in  Chapter  II.  Even  though  this  design  choice 
may  degrade  the  performance  in  a  noisy  environment,  it  does  not  have  any  noticeable 
impact  in  the  noiseless  analysis  that  follows.  The  ‘Shift’  block  plays  the  role  of 
amplification,  multiplying  the  signal  before  transmission  by  a  factor  of  four.  Pulse 
shaping  is  not  used  in  this  design. 


if  0  select  the  frequency 
of  input  d  0,  else  choose 


State_machine  :  Waits  for  the  first  1  of  the  preamble  in  order  to  enable  the  mux 


Figure  8  Modulation  Subsystem. 
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The  ‘Mcode  1’  in  Figure  8  is  used  for  initiation.  During  the  beginning  of  the 
simulation,  many  signals  inside  the  bloeks  start  in  undefined  states  and  other  bloeks,  like 
the  multiplexer,  eannot  propagate  these  kinds  of  signals.  A  bloek  that  would  enable  the 
multiplexer  after  the  propagation  of  the  undefined  signals  was  needed,  without  affeeting 
the  overall  performanee  of  the  designs.  Usually,  a  eonstant  enable  signal  is  used  along 
with  a  delay  measured  exaetly  to  overeome  this  problem.  A  very  simple  Matlab  program 
was  written  that  takes  advantage  of  the  faet  that  the  first  bit  of  the  preamble  is  1 .  Upon 
deteetion  of  the  first  1  to  the  ehannel,  the  ‘MCode  1’  enables  the  multiplexer  without  any 
further  interruption,  ft  should  be  notieed  that  the  eommand  xfix  ( {xlBoolean} ,  0) 
was  used  in  the  program  in  order  to  avoid  the  use  of  a  ‘Convert’  bloek.  Otherwise,  any 
value  assigned  as  0  or  1  in  a  Matlab  Code  is  translated  to  an  unsigned  integer  and  eannot 
be  used  as  it  is  to  drive  the  enable  port  (en)  of  the  ‘Mux.’  The  xfix(  )  eommand  explieitly 
eonverts  to  the  type  deseribed  as  the  first  argument.  In  this  ease,  the  value  0  is  assigned  as 
a  Boolean  type  and  not  as  an  integer  [13,  p.  243]. 

B.  RECEIVER 

The  non-eoherent  BFSK  reeeiver  is  illustrated  in  Figure  9.  The  ehoiee  of  a  Non¬ 
coherent  (NC)  reeeiver  design  was  made  to  eliminate  any  need  for  an  extra  eireuit  that 
would  extraet  the  phase  information  from  the  reeeived  signal.  The  reeeiver  eonsists  of  the 
following  subsystems:  the  two  Correlators,  the  Deeision  Cireuit,  the  Timing  Cireuit,  the 
two  Non-Coherent  Matehed  Filters  and  the  Deeoding  Subsystem.  The  Correlators  [24] 
and  the  Timing  Cireuit  form  the  feedbaek  path  and  the  Non-Coherent  Matehed  Filters 
and  the  Deeoding  Subsystem  form  the  feed-forward  path.  The  mixers  are  parts  of  both 
paths  and  are  shown  explieitly  in  the  figure.  The  ‘Relational’  bloek  eompares  the  non- 
eoherent  matehed  filters’  outputs  and  deeides  the  value  of  the  reeeived  bit.  The  eireuit 
designed  elosely  matehes  the  theoretieal  diagram  found  in  the  fntroduetion  of  BFSK 
seheme  in  Figure  1  in  Chapter  If,  with  the  addition  of  a  time  synehronization  eireuit  and  a 
Deeode  Subsystem. 
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Non -Coherent  BFSK  receiver 


Figure  9  Receiver’s  schematic  diagram  designed  in  Simulink  environment. 

1,  Non-Coherent  Matched  Filter  Subsystem 

A  non-coherent  matched  filter  is  introduced  in  Section  A  in  Chapter  II.  The 
implementation  of  this  filter  in  the  BFSK  receiver  includes  an  integrator  that  integrates 
the  input  signal  over  the  duration  of  a  bit  period  .  Thus,  correct  timing  for  the  specific 

design  means  the  correct  identification  of  the  beginning  of  each  bit  in  order  to  integrate 
over  the  correct  time  frame.  This  fact  generates  the  need  for  a  timing  feedback  circuit  that 
will  make  this  information  available. 

The  NC  Matched  Filter  Subsystem  in  Figure  10  has  two  filters  where  each  one 
consists  of  two  branches.  The  two  branches  correspond  to  the  sine  and  the  cosine  at  the 
symbol  frequency.  Each  branch  consists  of  a  mixer  (illustrated  in  Figure  9  before  NC 
Matched  Filter  Subsystem),  an  accumulator,  and  a  squaring  block.  Then,  the  two 
branches’  outputs  are  added  together  to  give  the  final  output  of  each  fdter.  The  output 
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values  of  the  two  filters  are  compared  in  order  to  decide  which  frequency  was 
transmitted.  The  frequency  that  was  transmitted  corresponds  to  the  filter  with  the  highest 
output  value. 

The  accumulator  included  in  the  NC  Matched  Filter  Subsystem  is  the  followed 
discrete  time  equivalent  of  an  integrator  and  it  adds  64  consecutive  values  of  the  input 
signal  before  it  is  reset  by  the  feedback  timing  circuit.  Every  accumulator  is  followed  by 
a  FIFO  memory,  which  only  reads  the  output  of  the  accumulator  just  before  the 
accumulator’s  reset  signal  is  raised.  In  this  way,  the  memory  captures  only  the  last  value 
of  the  respective  sum.  The  rest  of  the  block  is  straight  forward,  with  a  squaring  block  and 
an  adder  that  adds  the  signals  of  the  two  branches,  yielding  a  single  output  from  the 
subsystem.  The  downsample  implemented  in  all  branches  between  the  FIFO  memory  and 
the  squaring  blocks  is  used  in  order  to  downgrade  the  unneeded  computational  load.  After 
the  accumulation  of  the  correct  64  samples  of  a  bit  and  the  subsequent  storage  of  this 
value  to  a  FIFO  memory  element,  the  memory  yields  the  same  output  for  64  consecutive 
time  units.  Thus,  it  is  not  necessary  to  do  the  computations  for  all  values. 


Figure  10  Non-Coherent  Matched  filter  subsystem  (one  of  two). 
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2,  Timing  Circuit 

Synchronization  circuits  are  categorized  as  data-aided  and  non  data-aided  (or 
blind)  and  the  latter  require  no  training  data  sequenee  [25].  As  was  mentioned  previously 
in  the  transmitter  deseription  in  Chapter  IV,  this  design  uses  a  data-aided  eireuit  for  the 
aequisition  of  the  bit  synehronization.  The  preamble  is  a  known  pattern  that  will  help  to 
identify  not  only  the  start  time  of  eaeh  bit,  but  the  commeneement  of  each  packet  as  well. 

In  this  design,  the  feedbaek  synchronization  circuit  is  separated  into  three 
subsystems,  the  two  Correlators  and  the  Deeision  Cireuit.  The  Correlators  (Figure  11) 
work  similarly  to  the  NC  Matehed  Filter  Subsystem  with  the  main  differenee  being  that 
aeeumulators  have  been  replaeed  by  Finite  Impulse  Response  (FIR)  filters.  These  filters 
eonstitute  sliding  window  aeeumulators  of  the  last  64  samples.  In  order  to  make  a 
deeision  regarding  the  beginning  and  end  of  a  bit,  a  cireuit  that  updates  its  output  at  every 
received  sample  is  needed.  The  correct  timing  is  going  to  be  extracted  by  the  maxima  and 
minima  of  this  output.  In  eontrast,  the  feed-forward  path  with  the  non-coherent  matehed 
filters  need  only  accumulate  the  proper  values  and  then  yield  a  different  output  once 
every  64  samples  and  not  every  sample. 

In  Figure  II,  the  FIR  is  shown  to  be  a  eustom  FIR  filter  and  not  an  off-the-shelf 
block  provided  by  Xilinx.  The  reason  is  going  to  be  analyzed  in  the  troubleshooting 
seetion,  but  for  the  moment,  it  ean  be  thought  as  an  FIR  filter  with  impulse  response 

63  / 

h{n)  =  '^5{n-n^)  ,  where  d{n)  =  |  ■  The  initialization  bloek,  as  in  the  ease  of  the 

bloek  ‘Meodel’  of  the  transmitter,  is  used  only  to  prevent  the  undefined  initial  signals 
from  propagating  and  to  suppress  errors  during  the  simulation.  It  eonsists  of  a  eomparator 
that  has  two  delayed  versions  of  1  in  its  inputs;  thus,  propagating  an  initial  reset  high 
signal  once  at  the  beginning  of  the  simulation. 

The  differenee  of  the  outputs  of  the  two  eorrelators  is  the  input  to  a  logie  bloek 
(‘Meode’  block  in  Figure  12)  that  searehes  for  maxima  and  minima  of  the  input 
waveform.  Given  that  the  eorrelator  yields  a  maximum  when  the  eorreet  64  samples  of  a 
bit  have  been  added,  the  expectation  is  that  the  I’s  correlator  will  output  a  much  higher 
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value  than  the  O’s  eorrelator  when  the  whole  first  bit  of  the  preamble  has  just  been 
reeeived.  The  opposite  is  expeeted  at  the  seeond  bit  of  the  preamble,  beeause  it  has  the 
frequeney  eorresponding  to  the  0  bit.  Thus,  the  differenee  waveform  is  expeeted  to  be  a 
maximum  after  reeeiving  a  1  at  the  exaet  moment  that  all  64  samples  of  that  1  have 
entered  the  filter.  Following  the  same  reasoning,  the  differenee  waveform  is  expeeted  to 
be  a  minimum  after  reeeiving  a  0  at  the  exaet  moment  that  all  64  samples  of  that  0  have 
entered  the  fdter.  However,  when  two  eonseeutive  equal  bits  are  reeeived,  the  result  is 
different.  The  output  of  the  filter  will  reaeh  an  extremum  at  the  moment  that  all  the  64 
samples  of  the  first  bit  have  entered  the  filter,  and  then  remain  at  that  extremum  for  the 
following  64  samples,  eorresponding  to  the  seeond  bit.  Therefore,  the  filter  output 
displays  a  plateau  effeet,  whieh  is  less  useful  for  symbol  synehronization.  After  the 
identifieation  of  maxima  and  minima,  a  state  maehine  tries  to  verify  when  the  correet 
pattern  of  the  preamble  has  been  reeeived.  When  this  is  the  ease,  the  timing  of  the  bits  is 
well  known  and  this  information  is  supplied  to  the  aeeumulators  of  the  NC  Matehed 
Filter  Subsystem.  This  part  is  ineluded  in  the  Deeision  Cireuit  shown  in  Figure  12. 

The  timing  is  first  extraeted  in  absolute  time  values.  That  is,  a  free  running 
eounter,  i.e.,  ‘Counter  3,’  starts  eounting  from  the  moment  the  event  starts  working  up  to 
the  moment  it  stops.  When  an  event  oeeurs,  the  time  that  is  eaptured  is  relative  to  the 
power  up  time  of  the  eireuit.  The  information  needed  by  the  aeeumulators  of  the  forward 
path  is  at  what  instanee  of  a  64  cyele  time  they  should  stop  aeeumulating  the  previous  bit 
and  start  aoeumulating  the  new  bit.  The  timing  must  be  translated  to  time  modulo  64  and 
then  it  is  stored  for  the  rest  of  the  duration  of  the  paeket.  This  is  done  by  the  Sliee  and 
Register  bloeks  in  Figure  12.  The  extra  delay  introdueed  by  the  timing  eireuit  during  the 
feedbaek  path  must  also  be  eonsidered.  This  is  performed  by  the  ‘AddSubS’  and 
‘Constant  7’  bloeks  in  Figure  12.  The  exaet  value  of  ‘Constant  7’  was  determined 
experimentally. 

The  last  three  bits  of  the  preamble  are  not  used  by  the  state  maehine.  The  time 
that  eorresponds  to  the  last  two  Os  is  provided  to  the  timing  eireuit  in  order  to  ensure  a 
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timely  and  accurate  synchronization  of  the  main  circuit.  Additionally,  the  very  last  bit  of 
the  preamble,  the  final  1,  is  used  by  the  ‘MCode’  in  Decoding  Subsystem  along  with  the 
signal  ‘preamble  end’  in  order  to  identify  the  beginning  of  each  packet. 


Figure  1 1  Correlator’s  Subsystem  (one  of  two). 


the  state_machine  tries  to  verify  when 
all  the  preamble  had  been  detected 
in  order  to  extract  timing  info 


After  a  succesful  detection  ,  it  wait  for  the  rest 
bits  of  the  packet  and  starts  again  . 


Figure  12  Decision  Circuit. 
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Upon  reception  of  the  fifth  bit  of  the  preamble,  the  state  machine  ‘state_receiver’ 
stays  locked  for  the  rest  of  the  packet  and  then  it  starts  searching  for  a  new  preamble  after 
the  time  assigned  for  the  current  packet  elapses.  The  decision  circuit  also  provides  an 
output,  the  ‘preamble  end  signal’  that  helps  the  Decoding  Subsystem  to  identify  and 
block  the  preamble  from  the  output,  thus  rebuilding  the  initial  data  sequence. 

3,  Decoding  Subsystem 

The  Decoding  Subsystem,  illustrated  in  Figure  13,  accepts  as  input  the  result  of 
the  comparison  of  the  two  non-coherent  matched  filter  outputs,  which  is  a  sequence  of  Os 
and  Is,  and  tries  to  locate  the  last  preamble  bit.  The  acquisition  of  the  beginning  of  the 
preamble  may  or  may  not  be  correct,  because  the  Decision  Circuit  had  not  yet  finished 
the  extraction  of  timing  information.  However,  after  the  fifth  bit  of  the  preamble,  the 
receiver  is  synchronized  to  the  incoming  signal.  Thus,  the  Decoding  Subsystem  uses  the 
information  of  the  ‘preamble  end  signal,’  which  is  set  when  the  acquisition  of  the  fifth  bit 
of  the  preamble  is  accomplished.  The  following  two  Os,  i.e.,  the  sixth  and  seventh  bits  of 
the  preamble,  are  sacrificed  to  assure  the  timely  propagation  of  the  information  through 
the  whole  circuit  and  the  last  bit  of  the  preamble  is  used  to  signal  the  commencement  of 
the  information  bits.  The  Mcode  block  ‘preamble_  detacher’  is  a  simple  state  machine 
that  incorporates  the  logic  of  the  previous  fact  to  allow  the  storage  of  input  bits,  only  after 
the  identification  of  the  last  bit  of  the  preamble.  The  ‘FIFO  4’  memory  is  driven  by  a  read 
enable  signal.  This  enable  signal  is  delayed  by  enough  time  to  accommodate  the  total 
duration  of  the  preambles  that  are  taken  away.  This  is  accomplished  by  block  ‘Delay  4,  in 
Figure  13. 


29 


Figure  13  Decoding  Subsystem. 

Although  the  length  of  the  packet  had  been  taken  into  account  by  the  Decision 
Circuit,  the  ‘preamble  detacher’  is  counting  the  bits  aher  the  preamble  again  in  order  to 
achieve  better  synchronization.  An  external  clock,  i.e.,  the  ‘Counter  1’  block  in  Figure 
13,  is  used  as  a  reference  of  the  pulse  clock  time  of  the  last  preamble  bit.  After  120  clock 
cycles,  the  write  enable  (we)  goes  low,  disabling  the  FIFO  and  the  ‘preamble  detacher’ 
waits  for  the  next  ‘preamble  end’  signal.  The  clock  ‘Counter  1’  is  an  18  bit  register  and 
is  a  free  running  counter.  It  should  be  noted  that  all  signals  in  this  subsystem  are 
changing  every  bit  period  and  not  every  sample  period.  Block  ‘Down  Sample’  in  Figure 
13  downsamples  the  ‘preamble  end’  signal.  Notice  that  the  ‘channel  bits’  signal  has 
already  been  downsampled  in  the  previous  subsystem. 

Concluding  the  description  of  Decoding  Subsystem,  the  received  bits  are  the  input 
for  a  Time  Division  Demultiplexer  (TDD)  which  is  connected  to  a  Viterbi  decoder  as 
discussed  in  Section  B  in  Chapter  II.  The  input  sequence  to  the  TDD  is  the  encoded  bit 
stream  first  produced  in  the  convolutional  encoder  in  the  transmitter  (Figure  7).  The  TDD 
is  responsible  for  separating  this  sequence  back  to  two  different  streams  in  order  to 
supply  the  proper  inputs  to  the  ‘Viterbi  Decoder.’  The  parameters  settings  for  the  encoder 
and  decoder  are  the  same.  This  decoding  produces  the  received  message  bits,  which  will 
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ordinarily  be  identical  to  the  sent  message  bits.  Exceptions  to  this  can  be  caused  by 
decoding  errors,  which  can  occur  when  the  received  signal  is  corrupted  by  excessive 
noise,  interference,  or  fading  [26]. 

The  low  level  description  of  the  circuit  that  was  discussed  in  this  chapter  is 
validated  in  the  next  chapter,  along  with  the  results  and  the  weaknesses  of  the  design. 
Furthermore,  the  lessons  learned  during  the  design  process  are  also  included  as  a  deposit 
of  knowledge  for  follow  on  research  in  this  domain. 
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V.  DESIGN  VALIDATIONS,  RESULTS  AND 
TROUBLESHOOTING 


The  design  has  been  verified  in  the  Simulink  environment  and  a  critieal  part  of  the 
design  has  been  verified  using  Matlab  code-based  simulation.  Many  problems 
encountered  during  the  design  of  the  receiver  and  transmitter  are  also  discussed.  The 
figures  of  the  full  transmitter  and  receiver  design  are  included  in  Appendix  B,  thus  most 
of  the  figures  that  are  mentioned  in  this  chapter  refer  to  this  Appendix. 

A,  SYST  EM  GENERATOR 

In  order  to  verify  the  design,  the  Simulink  blocks  ‘From  Workspace’  and  ‘To 
Workspace’  and  ‘Scope’  were  used.  ‘From  Workspace’  was  used  to  supply  the 
transmitter  with  message  bits  and  to  pass  the  transmitter  output  to  the  receiver.  ‘To 
Workspace’  gives  the  ability  to  extract  the  values  at  a  specific  point  in  the  design  to  the 
Matlab  environment  in  order  to  drive  simulation  code  with  this  data  or  to  transfer  it  to  the 
receiver.  The  ‘Scope’  depicts  the  data  directly  on  a  plot  with  the  simulation  time  on  the 
horizontal  axis. 

The  input  sequence  to  the  transmitter  was  a  random  sequence  of  length  1900 
message  bits  with  the  leader  bit  always  1  and  the  trail  bit  always  0.  After  being  encoded, 
the  bits  were  transmitted  in  packets  of  128  encoded  bits.  The  correct  position  of  the 
preamble  should  first  be  determined.  Figure  14  illustrates  the  signals  of  the  block  ‘Scope’ 
of  Figure  26  in  Appendix  B.  The  first  plot  is  the  encoded  bits,  the  second  plot  is  the 
‘read_enable’  signal  that  releases  the  encoded  bits  to  the  Modulation  Subsystem,  and  the 
third  plot  represents  the  channel  bits  to  be  transmitted.  Channel  bits  are  defined  as  the 
encoded  bit  stream  with  the  preambles.  As  seen  in  Figure  14,  the  preamble  of  the  first 
packet  is  always  positioned  five  bits  ahead,  but  the  rest  of  the  preambles  were  correctly 
positioned  exactly  in  front  of  the  respective  packet. 
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Figure  14  Plots  of  the  eneoded  bits,  the  ‘read  enable’  and  the  ehannel  bits  (top  to 
bottom). 


As  shown  by  the  middle  plot  of  Figure  14,  the  ‘read  enable’  goes  high  after  the 
transmission  of  the  preamble,  allowing  the  propagation  of  the  encoded  bits  (along  with 
the  known  delay  of  the  five  bits  duration).  At  the  end  of  128  bits,  ‘read  enable’  goes  back 
to  low,  capturing  the  encoded  bits  in  the  ‘FIFO’  memory.  As  stated  before,  the  Preamble 
Subsystem  does  not  know  when  there  are  no  more  bits  for  transmission  and  an  enable 
low  signal  should  be  manually  triggered.  Otherwise,  it  would  continue  to  transmit  the 
preamble  at  the  proper  instances  for  the  rest  of  the  simulation  time. 

The  receiver,  on  the  other  hand,  is  using  the  preamble  for  timing  purposes  and 
then  removes  it.  The  sequence  without  the  preamble  is  the  input  to  a  Viterbi  decoder  that 
will  regenerate  the  recovered  message  sequence.  It  is  necessary  to  verify  that  the  decision 
logic  is  working  properly.  Matlab  code  was  written  to  simulate  the  decision  circuit  and 
duplicate  the  results  for  comparison  purposes.  In  order  to  avoid  any  phase  mismatch,  the 
values  after  the  mixers  as  shown  in  Figure  32  in  Appendix  B  were  captured  in  the  Matlab 
Workspace  by  blocks:  ‘To  Workspace  0-4’  in  order  to  supply  the  test  code.  This  way,  any 
noise  inserted  by  DDS  blocks  will  not  influence  the  results.  After  the  confirmation  that 
both  the  Matlab  code  and  the  simulation  under  Sysgen  were  providing  the  same  results, 
the  Matlab  code  was  modified  to  include  the  DDS  blocks  as  well  with  the  following 
results. 
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As  seen  in  Figure  15,  the  first  two  plots  are  from  ‘Scope  3’  (Figure  36  in 
Appendix  B)  and  represent  the  decision  waveform  and  the  decisions  made  for  the 
existence  of  the  preamble.  The  vertical  lines  represent  the  detection  of  the  part  of  the 
preamble,  used  for  timing  purposes,  i.e.,  10101.  The  last  two  plots  in  the  white 
background  are  the  respective  results  of  the  Matlab  verification  code.  In  practice,  both 
results  coincide  with  the  correct  position  of  the  preamble.  Recall  that  only  the  timing 
circuit  uses  the  first  preamble  bits  and  the  Decoding  Subsystem  uses  the  rest. 


Figure  15  Results  captured  from  the  ‘Scope  3’  (Figure  36)  in  Simulink  and  results  as 
plotted  by  the  equivalent  Matlab  Code  (from  top  to  bottom). 

Finally,  the  initially  transmitted  and  the  received  bit  streams  must  be  compared.  In 
order  to  verify  the  proper  operation  of  the  receiver,  multiple  runs  were  made  using  the 
same  stream  but  different  delay  value  (‘Delay  2’  in  Figure  9)  at  the  entrance  of  the 
receiver.  This  is  done  to  ensure  that  the  receiver  is  time  invariant  and  that  the  specific 
choices  made  for  the  synchronization  of  the  different  subsystems  are  not  case/input  delay 
sensitive.  After  that  process,  different  streams  were  used  with  random  delays.  The 
number  of  errors  was  calculated  by  a  short  Matlab  code  to  confirm  the  visual  verification. 
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The  specific  pattern  used  for  the  generation  of  the  bit  stream  is  a  random  sequence 
of  nineteen  hundred  bits  always  starting  with  1  and  ending  with  0.  Specifically,  the  value 
inserted  in  the  data  slot  of  the  parameter  window  ‘From  Workspace’  (Figure  5)  was 
[0;1901;1  a  0]'  where  ‘a’  was  defined  by  the  command  a  =rand(l,1900)<.5  in  the 
Command  Window.  This  command  creates  an  array  of  1900  random  bits.  The  choice  of 
that  length  was  dictated  by  the  constraints  of  the  current  configuration  of  the  receiver. 
The  counters  used  in  different  subsystems  of  the  receiver  are  18  bits  wide.  Since  there  are 
64  samples  per  bit,  there  are 


clock  cycles 
bit 


(5.1) 


theoretically  possible.  Due  to  delays,  the  effective  limit  is  slightly  under  4096  bits.  Given 
that  the  encoding  doubles  the  number  of  channel  bits  and  accounting  for  the  delays  and 
the  preamble,  roughly  nineteen  hundred  information  bits  can  be  received.  This  problem  is 
further  presented  in  Section  C.2  of  this  chapter.  Every  data  packet  should  fit  60 
information  bits,  and  there  will  be  four  extra  bits  in  the  last  packet  due  to  the 
convolutional  encoder.  The  convolutional  encoder  does  not  encode  the  information  bits 
of  each  channel  packet  separately,  but  the  whole  bit  stream  continuously. 


The  results  are  illustrated  in  Table  1  and  Table  2.  Recall  that  the  input  delay  to  the 
Receiver  is  ‘Delay2’  shown  in  Figure  9.  The  Matlab  code  shown  in  Figure  16  was  used  to 
align  the  output  of  the  Transmitter  and  the  Receiver  and  calculate  the  number  of  errors. 


%%  post  encoding 

num_test_bits  =3820;  %  Defines  the  length  of  the  encoded  bits  in  the 

%  test  sequence. 

delay=403;  %  the  output  value  is  delayed  by  'Delay!'  in  Decode 
%  Subsystem. 

tra  =after_Viterbi_encoder . s ignals .values ( 1 : num_tes t_bits ) ; 

rec2  =pre_Viterbi_decoder . signals . values (delay : num_test_bits-l+delay)  ; 

num_of_errors_pre  =sum  (abs  (tra-rec2)  ==1) 

%  position  =f ind (abs (tra-rec2 ) ==1 ) +delay-l 


Figure  16  Matlab  code  to  align  output  of  Transmitter  and  Receiver  and  calculate 
number  of  errors. 
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It  must  be  noted  that  the  number  of  errors  is  ealeulated  based  on  the  eneoded  bit 
stream.  There  are  two  obvious  methods  that  eould  be  used  to  eheek  for  errors.  We  eould 
eount  message  bit  errors  or  ehannel  bit  errors.  In  this  work,  it  was  ehosen  to  eount 
ehannel  bit  errors.  This  has  the  advantage  of  eounting  any  error,  sinee  the  reeeived  bits 
are  examined  before  the  deeoder  eorreets  any  errors.  However,  the  disadvantage  of  this 
method  is  that  it  does  not  eheek  for  errors  in  the  deeoder.  Sinee  the  deeoder  is  provided 
by  Xilinx,  we  have  eonfidenee  in  its  design  and  aceept  this  disadvantage  as  small. 

Before  the  simulation  the  following  parameters  should  be  defined  in  the  Matlab 
Command  Window:  T  =128-10^^ ,  t  =10 New  random  sequenees  are  generated  by  re- 
exeeuting  the  eommand  a  =rand  (1,1900)<.5  in  the  Command  Window. 


Run 

Number  of  errors 

Comments 

1 

0 

First  exeeution  of  eommand  a  =rand(l,1900)<.5 

2 

0 

Seeond  exeeution  of  eommand  a  =rand(l,1900)<.5 

3 

0 

Third  execution  of  command  a  =rand(l,1900)<.5 

4 

0 

Fourth  execution  of  command  a  =rand(l,1900)<.5 

Table  1.  Results  of  multiple  runs  with  different  input  sequenee  and  eonstant  input 
delay  (value  set  to  zero). 
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Run 

Input  Delay  Value 

Number  of  errors 

1 

0 

0 

2 

9 

0 

3 

15 

0 

4 

23 

0 

5 

31 

0 

6 

45 

0 

7 

57 

0 

8 

63 

0 

9 

75 

0 

10 

98 

0 

Table  2.  Results  of  multiple  runs  with  eonstant  input  sequence  and  variable  input 
delay.  All  runs  made  after  first  execution  of  command  a  =rand  (1,1900)<.5. 

The  simulation  of  both  the  transmitter  and  the  receiver  showed  that  the  design  is 
working  correctly.  The  acquisition  of  the  preambles  was  made  at  the  correct  times  and 
the  timing  was  correctly  extracted.  In  the  absence  of  noise  no  malfunction  had  been 
observed.  Minor  annoyances  are  presented  in  the  next  section,  which  explains  the  various 
problems  that  appeared  during  the  design  process. 

B,  TROUBLESHOOTING  AND  LESSONS  LEARNED 

Much  knowledge  was  acquired  during  the  design  process.  Even  though  Xilinx  is 
trying  to  offer  programs  that  are  easy  to  use,  there  were  many  instances  in  which  the 
debugging  process  was  time  consuming.  Many  of  these  points  are  illustrated  as  follows 
for  easy  reference  and  as  a  guide  of  things  to  avoid. 
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1,  Transmitter 


The  preamble  of  the  first  paeket  does  not  fit  right  before  the  information  bits. 
There  are  always  five  zeros  between  the  end  of  the  preamble  and  the  beginning  of  the 
encoded  message  bits  in  this  packet.  In  subsequent  packets,  the  preamble  is  positioned 
correctly.  This  event  should  occur  every  time  after  a  reset.  The  fact  that  these  inserted 
bits  are  zero  does  not  affect  the  decoding  procedure  and  the  action  taken  was  to  position  a 
1  in  front  of  every  bit  stream.  This  1  acts  like  a  flag  that  information  bits  follow  and  made 
the  debugging  process  easier  as  well.  The  initial  assumption  that  a  proper  combination  of 
the  delay  values  in  blocks  ‘Delay  1,’  ‘Delay  2’  and  ‘Delay  4’  in  Figure  29  would  correct 
the  problem  was  later  rejected.  The  reason  may  be  the  delay  of  the  input  data  to  reach  the 
FIFO  memory  of  Figure  30. 

The  Preamble  Subsystem  is  not  intelligent  enough  to  sense  when  data  is  ready  to 
be  transmitted.  It  is  in  need  of  external  manual  control  to  enable  the  counter  in  the 
preamble.  While  the  counter  of  the  preamble  is  not  enabled,  the  input  data  is 
convolutionally  encoded  and  stored  in  the  memory.  In  the  same  sense,  the  preamble 
circuit  does  not  know  when  there  is  no  more  data  to  be  transmitted.  To  void  the 
transmission  of  packets  that  contain  only  the  preamble,  the  information  bits  must  be 
provided  constantly  to  the  receiver.  The  bits  that  cannot  be  transmitted  at  any  given  time 
are  stored  to  an  internal  memory.  Otherwise,  the  circuit  can  be  manually  controlled 
through  the  external  enable  and  reset  port.  The  output  signal  ‘empty’  from  the  FIFO 
memory  of  Figure  30  memory  could  be  a  good  indication  of  when  the  input  data  is  ready 
for  transmission.  This  could  help  the  problem  with  the  preamble  stated  in  the  previous 
paragraph  as  well.  Nevertheless,  this  signal  does  not  seem  to  behave  as  expected.  It  goes 
high  too  early  as  shown  in  Figure  17  and  does  not  go  low  after  the  transmission  of  the  last 
bit  stored  in  the  FIFO  memory  as  illustrated  in  Figure  18.  The  reason  was  not  determined 
in  this  research. 
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Figure  17  Plot  of  the  ‘empty’  output  signal  and  the  ‘din’  input  signal  of  the  FIFO 
memory  in  the  beginning  of  the  simulation. 


Figure  18  Plot  of  the  ‘empty’  output  signal  and  the  ‘din’  input  signal  of  the  FIFO 
memory  at  the  end  of  the  simulation. 

The  state  machine  ‘MCode  1’  in  Figure  32  could  have  been  avoided  and  replaced 
by  a  constant  with  an  output  value  of  Boolean  1  connected  to  the  enable  port  of  ‘Mux’ 
through  a  delay.  This  delay  should  be  measured  to  overcome  the  undefined  signal  errors. 
This  tactic  is  used  in  Figure  33  for  delaying  the  enable  of  the  ‘Relational’  block. 

2,  Receiver 

At  the  beginning,  a  point  of  concern  was  that  the  signal  used  to  extract  the  timing, 
i.e.,  the  output  of  the  Correlators  and  specifically  the  signal  at  din  of  block  ‘MCode’  in 
Figure  37,  does  not  always  provide  clear  and  distinguishable  peaks.  The  decision  of 
where  exactly  the  peaks  occurred  can  become  very  vague  and  this  would  yield  many 
errors.  As  an  example,  in  Figure  19  a  successful  acquisition  of  the  preamble  is  illustrated. 
The  first,  third  and  fourth  detection  of  the  preamble  bits  seem  to  be  correctly  positioned, 
which  is  not  the  case  for  the  second  and  fifth.  There  is  a  flat  area  near  the  theoretical 
peaks,  which  in  turn,  inserts  an  error  to  the  timing.  From  inspection  of  many  preambles, 
the  third  and  fourth  bits  of  the  preamble  yielded  the  least  error  and  these  were  initially 
used  for  decision,  even  though  all  bits  were  used  for  the  state  machine. 
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Figure  19  Example  of  a  preamble  aequisition  with  initial  values  for  ‘DDS  clock  rate.’ 

Top  plot  shows  the  decision  signal  and  bottom  plot  shows  the  successive 
peak  identification  made  by  the  timing  circuit. 

When  the  verification  Matlab  code  was  changed  in  order  not  to  use  the  output  of 
the  mixers  of  the  System  Generator  design,  it  had  been  observed  that  the  output  of  the 
DDS  blocks  used  for  mixing  purposes  in  the  Receiver,  which  are  blocks  ‘DDS  Compiler 
v2  for  Is’  and  ‘’DDS  Compiler  v2  for  Os  in  Figure  33,  did  not  yield  the  proper  output 
signal.  The  change  of  the  parameter  ‘DDS  clock  rate’  that  seems  irrelevant  was  changed 
for  both  the  transmitter  and  the  receiver  in  order  to  correct  the  output.  The  initial  value  of 
500.0  (MHz)  was  changed  to  100.0  (MHz).  In  this  way  the  output  waveforms  of  the  DDS 
blocks  closely  matched  those  predicted  by  Matlab  simulation.  The  improvement  to  the 
signal  used  to  extract  the  timing  was  dramatic  and  many  synchronization  problems  had 
been  solved  as  indicated  in  Figure  20.  The  Xilinx  technical  support  replied  that  the  ‘DDS 
clock  rate’  should  match  the  ‘FPGA  clock  period’  defined  in  the  Sysgen  token.  Thus, 

because  ‘FPGA  clock  period’  is  10ns,  ‘DDS  clock  rate’  should  be  — ^ —  =  \00MHz  .  This 

\0ns 

answer  is  not  very  convincing  given  that  there  is  no  reason  to  be  able  to  define  a 
parameter  that  you  must  calculate  uniquely  from  another  parameter  already  defined. 
Given  that  the  ‘DDS  clock  rate’  defines  also  the  upper  limit  of  the  output  frequency  as 
explained  in  Section  A.3  in  Chapter  IV,  things  are  becoming  more  complicated.  Further 
research  of  the  source  of  the  problem  must  be  made. 
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Figure  20  Example  of  a  preamble  aequisition  with  final  values  for  ‘DDS  cloek  rate.’ 

Top  plot  shows  the  deeision  signal  and  bottom  plot  shows  the  sueeessive 
peak  identification  made  by  the  timing  circuit. 

The  synchronization  circuit  of  the  receiver  had  not  been  designed  to  be  insensitive 
to  noise;  except  for  some  very  limited  capabilities  that  the  ‘MCodeO’  block  in  Figure  37 
can  yield.  ‘MCodeO’  was  inserted  to  the  design  to  implement  the  idea  of  a  threshold  that 
a  signal  should  exceed  in  order  to  be  translated  in/mapped  to  0  or  1 .  The  main  problem 
without  that  threshold  was  that  even  a  small  amount  of  noise  to  the  channel  would  make 
the  decision  circuit  believe  that  there  are  transmitted  bits  and  eventually  it  would  match 
the  preamble  to  the  random  noise.  Averaging  the  timing  extracted  over  more  than  one 
bits  of  the  preamble  would  also  give  better  immunity  to  noise.  Then,  the  convolutional 
encoder  could  correct  the  few  mistakes  made. 

Another  drawback  of  the  program  included  in  the  ‘MCode’  block  of  Figure  37, 
which  is  the  main  decision  logic,  is  that  there  is  no  escape  from  going  sequentially 
through  all  the  states.  Once  entered,  it  only  searches  to  accomplish  the  criteria  to  enter  the 
next  state  up  the  last  one.  A  maximum  stay  time  at  each  state  should  have  been  given 
after  which,  the  state  machine  would  start  over.  This  is  also  a  way  to  compensate  for 
noisy  reception. 

The  counters  of  the  receiver,  i.e.,  ‘Counter  3’  in  Figure  37  and  ‘Counter  1’  in 
Figure  38,  are  free  running  counters,  which  means  that  they  never  reset  to  restart 
counting;  they  are  only  limited  by  the  assigned  output  precision.  The  MCode  block  uses 
these  counters  to  record  the  time  of  some  incidents.  In  case  that  the  counter  is  reset  at  an 
improper  time,  the  relative  timing  of  the  incidents  is  destroyed.  For  example,  ‘MCode’  in 
Figure  38,  counts  120  bit  periods  after  the  reception  of  the  full  preamble  in  order  to 
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search  for  a  new  preamble.  If  the  counter  resets  during  this  counting,  then  the  block  will 
stay  in  the  waiting  state  for  an  unknown  time.  On  the  other  hand,  the  counters  will 
eventually  reset  after  exceeding  the  maximum  assigned  width  of  their  output.  For  this 
reason,  a  reset  to  this  counter  is  needed  but  must  be  built  in  such  a  way  that  will  not  affect 
the  timing  of  the  following  MCode  blocks.  An  attempt  to  solve  that  problem  in  ‘Counter 
3’  of  Figure  37  using  the  signal  ‘reset_counter’  was  only  partially  successful  and  was 
disconnected. 

The  real  function  of  these  counters  should  be  further  analyzed.  A  state  machine 
that  follows  the  flow  diagram  uses  ‘Counter  3’  (Figure  37)  and  ‘Counter  1’  (Figure  38). 
The  counters  serve  to  ensure  that  the  exact  time  after  the  reception  of  the  preamble  had 
past  and  a  new  search  for  preamble  should  be  made.  In  detail,  the  ‘MCode’  of  Decision 
Circuit  (Figure  37)  detects  up  to  the  fifth  bit  of  the  preamble  and  then  waits  for  7872 

123bits»64 -  clock  counts  until  the  next  detection.  On  the  other  hand,  ‘MCode’ 

I  bit  ) 

of  Decoding  Subsystem  that  detects  up  to  the  eight  bit  of  the  preamble,  must  count  120 
clock  counts  (the  counter  now  works  in  bit  period  because  the  signal  had  already  been 
downsampled  by  64)  until  the  next  detection. 

The  FIR  filters  in  the  NC  Matched  Filter  subsystem  as  illustrated  in  Figure  34, 
were  initially  implemented  by  the  respective  FIR  Xilinx  blocks.  The  problem  that 
appeared  was  that  Sysgen  was  always  mapping  these  blocks  to  DSP48  cells  (see  the 
section  on  DSP-Enhanced  FPGAs  in  Appendix  A).  The  number  of  DSP48  cells  needed 
for  the  four  64-coefficient  filters  is  128  cells,  where  only  48  are  present  in  the  chip.  In 
order  to  avoid  this  problem  and  given  that  all  the  coefficient  were  unity  for  that  case,  a 
custom  design  was  made  as  shown  in  Figure  21.  In  this  way,  not  only  the  demand  for 
DSP48  cells  were  minimized  but  the  demand  for  general  blocks  was  lower  as  well.  The 
64  delay  block  is  a  pipe  of  64  flip  flops  in  a  row.  The  ‘new  value’  is  the  value  that  enters 
the  pipe  and  the  ‘old  value’  is  the  one  that  exits  the  pipe.  The  combination  of  ‘AddSub’ 
and  ‘Accumulator  1’  blocks  is  responsible  for  accumulating  every  new  value  to  the 
current  sum  while  subtracting  the  value  that  is  64  periods  old.  In  such  a  way,  the 
accumulator  always  contains  the  64  most  recent  values.  The  accumulation  of  garbage 
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over  time  in  the  ‘Accumulator  1’  is  possible.  This  hypothesis  has  not  been  confirmed 
during  the  simulations,  but  is  still  a  concern  for  the  real  circuit  on  the  FPGA.  A  reset  of 
the  accumulator  when  the  message  part  of  the  packet  is  under  reception  would  ensure  that 
no  garbage  is  left  in  the  accumulator. 


Figure  21  FIR  custom  block. 

The  ‘Viterbi  Decoder  v  6  0’  in  the  Decoding  Subsystem  included  in  Figure  38 
appears  in  green,  which  means  that  an  extra  license  must  be  granted  by  Xilinx.  In  this 
case,  a  90  days  free  license,  which  is  offered  to  anyone  through  the  Xilinx  website,  was 
acquired  in  order  to  verify  the  functionality  of  the  design.  In  any  case,  the  verification  of 
the  circuit  was  made  before  the  Viterbi  Decoder  block  because  the  exact  number  of  errors 
should  be  revealed  and  not  be  covered  by  corrections  made  by  the  decoder. 

The  verification  of  the  design  was  illustrated  in  this  chapter.  Many  of  the 
problems  encountered  were  also  exposed  and  useful  lessons  learned  were  also  described. 
This  part  concludes  the  discussion  about  the  design.  The  next  chapter  summarizes  the 
work  that  has  been  done  in  this  thesis  and  proposes  possible  expansions  and  follow  on 
work  that  can  be  made. 
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VI.  CONCLUSIONS 


A,  SUMMARY  OF  THE  WORK 

The  concepts  of  Software  Defined  Radio  (SDR)  and  Binary  Frequency  Shift 
Keying  (BFSK)  modulation  were  introduced  and  the  application  of  Field  Programmable 
Gate  Arrays  (FPGAs)  to  SDR  was  further  examined.  The  capabilities  offered  by  the 
FPGAs  to  easily  transform  a  design  to  circuit  were  used  to  build  a  BFSK  transceiver. 

Xilinx  System  Generator  was  used  to  design  a  data  aided  BFSK  transmitter  and 
receiver.  Extensive  simulation  assures  their  proper  function.  Matlab  code  was  used  to 
verify  the  results  taken  by  the  simulation.  The  designs  were  finally  placed  and  routed  to  a 
Virtex-4  FPGA  to  ensure  that  no  errors  occurred  during  that  process. 

Appendix  A  includes  an  introduction  to  FPGAs,  their  internal  structure  and  their 
utilization  in  the  SDR  concept.  Extensive  descriptions  of  all  the  blocks  used  and  the 
parameters  assigned  to  each  block  are  given  in  Appendix  B.  This  facilitates  the 
reproduction  of  the  design  and  gives  a  better  understanding  of  how  System  Generator  is 
working.  In  Appendix  C,  the  Matlab  code  used  to  simulate  the  decision  signal  is  given. 
The  reproduction  of  the  whole  circuit  is  not  as  important  as  the  reproduction  of  the 
decision  signal,  because  this  is  the  most  crucial  parameter  in  the  whole  design. 

B.  SIGNIFICANT  RESULTS 

The  concept  of  Software  Defined  Radio  proved  to  be  fully  realizable  and  both  the 
designs  of  the  transmitter  and  receiver  do  not  exceed  in  total  the  capacity  of  a  moderate 
FPGA,  after  being  placed  and  routed.  The  device  utilization  summary  for  the  transmitter 
and  receiver  is  shown  in  Table  3.  The  amount  of  work  needed  to  design  a  fully  functional 
transceiver  was  mostly  consumed  in  the  learning  of  the  System  Generator  blocks  and  ISE 
suite,  which  is  an  overhead  not  needed  for  follow-on  designs. 
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Design  Type 
Number  of^^^^^ 

Transmitter 

Receiver 

Total 

DSP48s  (total;  48) 

0  (0%) 

4  (8%) 

4  (8%) 

Block  RAMs  (total:  72) 

4  (5%) 

10(14%) 

14  (19%) 

Slices  (total:  10752) 

177  (2%) 

3616  (34%) 

3793  (35%) 

Table  3.  Device  utilization  summary. 

The  simulation  proved  that  the  receiver  works  as  expected  and  when  noise  is  not 
present,  no  errors  were  generated  by  the  receiver.  The  preamble  was  correctly  positioned 
in  front  of  each  packet,  with  the  exception  of  the  first  one  as  discussed  in  Section  C.l, 
Chapter  V.  The  preamble  helps  the  receiver  to  identify  the  beginning  of  every  packet  and 
extract  timing  information.  The  receiver  then  removes  the  preamble  and  decodes  the 
received  sequence.  Any  few  errors  made  by  noise  are  corrected  by  the  Viterbi  decoder. 
No  timing  errors  were  identified  during  the  place  and  route  process  made  by  the  ISE 
software. 

C.  SUGGESTIONS  FOR  FUTURE  WORK 

1,  Limitation  of  the  Design 

The  timing  circuit  is  not  built  to  be  very  tolerant  to  noise.  The  main  reason  for  the 
susceptibility  to  noise  is  that  the  state  machine  (MCode  in  Figure  37)  incorporated  in  the 
receiver,  does  not  include  an  abort  condition  in  case  there  is  a  misinterpretation  of  the 
decision  signal.  Specifically,  if  the  noise  exceeds  the  threshold  chosen  at  the  specific 
period  that  the  circuit  tries  to  identify  the  existence  of  the  first  bit  of  the  preamble,  the 
state  machine  is  obliged  to  enter  the  next  stage  and  does  not  abort  until  it  goes  through  all 
the  states  sequentially.  Then  it  must  wait  a  packet  period  until  it  search  again  for  a  new 
preamble  sequence. 

The  frequencies  used  are  not  orthogonal  according  to  the  definition  given  in 
Section  A  in  Chapter  II.  This  was  due  to  the  initial  implications  described  in  Section  B.2 
in  Chapter  V.  In  order  for  the  two  frequencies  to  become  orthogonal,  they  must  differ  by 
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integral  multiples  of  the  channel  bit  rate  =—10* Mbps.  In  this  case,  the  frequency 

64 

separation  should  be  multiples  of  1.5625  MHz.  For  example,  if  /  =40MHz  ,  the  other 
frequency  could  be  /j  =  43. 125MHz  . 


The  limitation  of  the  free  running  counters  must  be  also  addressed.  As  explained 
in  Section  B.2  of  Chapter  V,  the  current  configuration  only  guarantees  the  reception  of 
less  than  two  thousand  message  bits,  before  the  counters  are  reset  and  a  critical  error  may 
occur.  The  extension  of  the  free-running  counters  from  18  bits  to  a  higher  number  would 
only  give  some  more  space,  without  eventually  solving  the  problem.  A  reset  signal 
should  be  inserted  at  a  proper  time  that  will  not  affect  the  rest  of  the  design. 

Pulse  shaping  is  not  used  in  the  design,  but  would  likely  help  to  suppress  Inter 
Symbol  Interference  [27,  pp.  233-244],  provided  it  was  done  in  a  way  that  preserves 
orthogonality.  The  realization  of  filters  in  System  Generator  is  made  easy  by  the  use  of 
Xilinx  ‘FDATool’  and  ‘FIR  Compiler’  blocks.  ‘FDATooF  interfaces  the  Simulink  Signal 
Processing  Toolbox  to  offer  a  graphical  interface  to  design  digital  filters.  ‘FIR  Compiler’ 
can  be  also  used  alone  in  case  the  coefficients  are  precalulated. 


2.  Suggestions 

The  design  presented  is  a  good  starting  point  for  a  design  including  extra  features, 
such  as  noise  tolerance  and  pulse  shaping.  These  features  are  optional  but  will  make  this 
implementation  more  useful  in  practice.  Then  the  design  should  be  more  exhaustively 
verified  after  transfer  to  the  FPGA  to  assure  proper  timing  of  the  components.  The 
simulation  under  System  Generation  is  cycle  and  bit  accurate  [12].  In  this  sense,  even  if 
no  timing  errors  were  created  after  place  and  routing  and  the  simulation  under  System 
Generator  verified  the  proper  function  of  the  circuit,  further  verification  of  the  design 
after  implementation  is  compulsory.  Chip  Scope  Pro  is  a  useful  way  to  test  the  design 
after  download  to  the  target  FPGA.  In  order  to  do  so,  pins  to  ‘Gateway  In’  and  ‘Gateway 
Out’  blocks  must  be  assigned.  These  pins  must  be  the  ADC  input  to  and  the  DAC  output 
from  the  board. 
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The  DDS  clock  rate  discussed  in  Section  B.2,  Chapter  V,  must  be  further 
examined  and  the  connection  of  the  parameter  to  the  DDS  block  must  be  further 
examined.  Even  if  in  this  design  the  output  frequency  of  the  block  is  correct,  there  is  no 
guarantee  that  it  will  work  as  well  under  different  design  parameters.  Even  the  extensive 
documentation  included  in  the  online  support  page  [14]  does  not  give  clear  answers  about 
the  relation  between  the  DDS  clock  rate,  the  output  frequency  of  the  DDS  block,  and  the 
EPGA  clock  period  in  the  Sysgen  token. 

Except  the  problem  with  the  DDS  block,  there  are  other  less  significant  design 
errors  to  be  solved.  The  delay  after  the  first  preamble,  as  presented  in  Section  B.l, 
Chapter  V,  should  be  further  examined.  The  cause  of  this  delay  is  currently  unknown  and 
could  not  be  solved  with  change  of  the  delays’  values  included  in  the  design.  In  the  same 
section,  the  inability  to  detect  incoming  message  bits  is  also  discussed.  This  may  be  a 
severe  limitation  in  real-life  implementations. 

After  proper  verification  of  the  design,  thorough  tests  under  different  levels  of 
noise  could  be  done.  Bit  error  rate  and  Signal  to  Noise  Ratio  can  be  plotted  and  compared 
to  the  theoretical  performance  of  a  non-coherent  BESK  receiver.  In  this  way  this  design 
will  be  fully  documented. 

This  design  can  be  used  as  a  foundation  for  designs  using  more  complex 
modulation  schemes.  The  extension  to  M  -Erequency  Shift  Keying  ( M  ESK)  is  likely 
straightforward  and  modification  to  Binary  Phase  Shift  Keying  (BPSK)  should  be  easy, 
although  it  would  require  carrier  frequency  and  phase  synchronization  [27,  pp.  270,  295]. 
In  this  manner,  a  database  of  modulation  schemes  can  be  created  leading  closer  to  the 
ultimate  goal  of  a  multimode  SDR  transceiver. 

The  electronic  files  that  contain  this  design  are  on  file  with  the  manager  of  the 
Cryptologic  Research  Eaboratory  [28].  Helpful  support  documentation  from  Xilinx  is 
discussed  in  Section  C,  Chapter  III. 
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APPENDIX  A.  BACKGROUND  ON  FPGA  AND  TECHNOLOGIC 

BACKGROUND 

This  Appendix  introduces  the  internal  structure  of  Field  Programmable  Gate 
Arrays,  their  function  and  how  they  are  implemented  in  the  Software  Defined  Radio 
concept.  An  evaluation  of  different  technological  options  in  implementing 
communication  modulating  techniques  and  Software  Defined  Radio  follows.  A 
comparison  between  these  options  is  also  included. 

A,  BRIEF  DESCRIPTION  OF  AN  FPGA 

Xilinx  Inc.  invented  the  FPGA  in  1984  [29].  As  electronic  circuits  were 
becoming  more  advanced,  the  glue  logic  [30]  was  getting  more  complex  and 
improvements  to  the  Complex  Programmable  Logic  Devices  (CPLD)  were  needed  to 
handle  more  demanding  applications.  FPGAs  came  as  a  logical  advancement  to  help 
interconnect  large  integrated  circuits  providing  more  printed  logic  and  incorporating 
more  gates. 

Initial  manufacturing  technologies  of  FPGA  included  antifuse.  Static  Random 
Address  Memory  (SRAM),  Electrically  Erasable  Programmable  Read-Only  Memory 
(EEPROM)  and  some  minor  types  [31].  Their  difference  is  that  SRAM  requires  external 
boot  devices  but  it  is  reprogrammable,  antifuse  is  one  time  programmable  and  EEPROM 
is  reprogrammable  and  does  not  need  an  external  boot  device.  Nowadays,  the  main  type 
used  is  SRAM,  except  when  reprogramability  is  not  a  mandatory  feature,  in  which  case, 
antifuse  is  a  cheaper  solution. 
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Figure  22  Simplified  Version  of  FPGA  Internal  Architeeture  (From;  [32]). 


LOGIC 

TILES 


Global  Networl 


Efficient  Long  Lines  High  Speed  Very  Long  Lines 


Clock  Aggregation 
(Splittable  Clock 
Spine) 


LogicTile 


Ultra-Fast 
Local  Routing 


3 


RouflhoSwfcfi 


Figure  23  Typical  FPGA  architecture  (From;  [33]). 


50 


A  Field  Programmable  Gate  Array  is  a  two  dimensional  array  of  logie  bloeks  and 
flip-flops  with  eleotrieally  programmable  intereonneetions  between  them  [32],  These 
intereonneetions  ean  be  identified  in  Figures  22  and  23  [33]  and  are  distinguished  in  loeal 
(or  short)  and  long  routing  lines.  Logie  Tiles,  also  ealled  Logie  Sliees  aeeording  to  other 
manufaeturers,  are  the  smallest  bloeks  of  logie.  Due  to  its  versatile  strueture,  a  different 
primitive  operation  (addition,  multiplieation,  ete.)  ean  be  assigned  to  eaeh  Logie  Tile.  In 
order  to  build  more  eomplex  funetions,  many  Logie  Tiles  are  attaehed  to  an  adaptive 
network.  Eleotrieally  programmable  switohes  (as  shown  in  Figure  23  under  the  subtitle 
Routing  Switoh)  are  responsible  for  oustomizing  the  network. 

From  the  aforementioned  desoription,  it  is  possible  to  identify  the  two 
oonfigurable  aspeots  of  FPGAs: 

•  The  funotion  assigned  to  eaeh  logie  blook.  This  funotion  is  going  to  define 
whioh  elements  inside  the  logie  blook  will  be  aotivated  in  order  to  yield 
this  speoifio  funotion.  The  logie  blook  itself  must  have  a  strueture  that  may 
support  a  wide  variety  of  different  logioal  funetions.  One  suoh  strueture  of 
great  importanoe  is  that  of  a  Lookup  table. 

•  The  intereonneetions  between  the  logie  bloeks.  The  oombination  of  many 
primitive  funetions  assigned  to  different  bloeks  ean  give  a  very  eomplex 
funotionality  as  a  result.  Due  to  the  way  that  this  funotion  is  implemented, 
it  may  even  be  exeouted  faster  than  when  a  mioroprooessor  is  used  instead. 
Nevertheless,  it  should  be  kept  in  mind  that  this  flexible  routing  adds 
muoh  overhead  to  the  ohip  itself  It  oonsists  of  a  wiring  grid  oontrolled  by 
eleotrieally  programmable  switohes  and  the  interoonneotion  overhead  ean 
even  be  olose  to  two  thirds  in  terms  of  power  oonsumption  and  silioon  in 
deep  submioron  prooesses  [34].  The  higher  the  flexibility  of  routing  in  an 
FPGA,  the  higher  the  utilization  of  the  logie  and  the  lower  the  density  of 
logie  blook.  The  manufaeturers  of  FPGAs  should  always  oonsider  this  is  a 
tradeoff  for  their  produots. 

In  Applioation  Speoifio  Integrated  Cirouits  (ASICs),  no  need  exists  for  logie  tiles, 
nor  for  long  routing  lines,  beoause  the  operation  of  its  logioal  oomponents  is  prespeoified 
and  sequential  logioal  bloeks  are  plaoed  during  manufaoturing  prooess  oloser  to  eaeh 
other.  This  makes  the  design  muoh  more  oonoentrated  and  more  effioient  but  it  laoks  the 
main  oharaoteristio  of  FPGAs,  versatility  and  upgradability  [76]. 

A  detailed  desoription  on  FPGAs  is  given  in  [35],  whioh  ean  be  used  for  further 
referenoe. 
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B,  ADVANTAGES  AND  APPLICATIONS  OF  FPGAS 

Until  fairly  recently,  FPGAs  did  not  have  enough  gate  capacity  or  computational 
power  to  implement  digital  signal  processing  (DSP)  tasks.  They  have  also  been  perceived 
as  being  expensive  and  power  hungry.  The  versatility  and  the  extra  capabilities  that  they 
acquired  after  the  change  of  the  century  did  change  many  of  their  applications.  One  of 
their  newest  features  is  the  introduction  of  new  hard  embedded  multipliers,  which  yield 
extra  DSP  capabilities.  A  detail  description  of  the  embedded  multipliers  is  given  later  in 
this  chapter. 

The  synthesis  and  development  tools  have  also  evolved  and  include  many 
different  design  environments.  Except  for  the  Hardware  Description  Languages,  every 
FPGAs  manufacturer  offers  a  proprietary  designing  suite  consisting  of  block  diagram 
designing  tools  or  schematic  processors.  These  tools  are  time  and  signal  accurate  and 
their  ease  of  use  minimizes  the  learning  curve  and  the  debugging  process,  which  in  turn 
leads  to  short  time  to  market.  Intellectual  Property  (IP)  cores  are  also  available,  which  are 
designs  implementing  complex  functions  that  can  be  incorporated  into  other  designs  for  a 
fee.  Usually,  these  are  also  available  from  third  party  vendors  and  their  acquisition 
accelerates  the  time  to  market  and  reduces  the  need  for  proficiency  in  designing  FPGAs 
[36]. 

An  example  of  great  interest  is  mobile  communications.  The  newer  CDMA2000 
EVDO  and  W-CDMA  standards  demand  computationally  intensive  digital  signal 
processing,  which  requires  much  power.  The  ASIC  solution  is  always  better  suited  in 
such  an  environment,  but  the  lack  of  upgradability  makes  it  undesirable.  The  standards  do 
not  last  for  more  than  four  to  five  years,  making  the  replacement  of  the  hardware  at  a 
base  station  uneconomical.  The  FPGA  is  the  solution  that  closely  matches  the 
effectiveness  of  ASICs  retaining  at  the  same  time  the  ability  to  upgrade  [37]. 

Another  important  feature  of  FPGAs  is  their  ability  to  conform  to  today’s  general 
trend  towards  system-on-chip  (SoC),  thus  making  the  integration  of  many  different 
circuits  in  one  single  chip  a  reality.  This  saves  both  money  and  space,  offering  high 
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bandwidth  between  deviees.  FPGAs  are  used  for  many  diverse  applieations  ineluding 
ASIC  prototyping,  digital  signal  proeessing,  medieal  equipment  military  systems, 
Artifieial  Intelligenee  (AI)  and  eryptography  [38]. 

The  eharaeteristies  explained  in  detail  above  yield  designs  implemented  in  FPGAs 
with  short  time  to  market  and  redueed  eost  of  development.  Also,  the  maintenanee  and 
upgrade  eost  ean  be  minimized,  if  this  is  applieable  to  the  speeifle  applieation. 

C.  FPGA  VS.  GPP 

Until  reeently,  the  fabrieation,  or  engraving,  proeess  for  General  Purpose 
Proeessors  (GPPs)  were  one  generation  ahead  of  the  engraving  proeess  for  FPGAs. 
Nowadays,  this  is  no  longer  true.  The  eurrent  generation  of  Intel’s  Penryn®  proeessors 
uses  45nm  lithography  for  engraving  and  both  the  MIPS32  74K  and  ARM  Cortex®-A9 
utilize  a  TSMC  65nm  generie  proeess  [39].  Today,  the  manufaeturing  proeess  of  FPGAs 
has  elosely  matehed  that  of  GPP  with  Xilinx’s  latest  FPGA  series  Virtex®-5  [40]  and 
Altera's  Stratix®  III  [41]  using  a  65nm  engraving  proeess.  Altera  also  reeently  released 
the  40nm  Altera's  Stratix®  IV  series  [42]. 

The  benefit  of  switehing  to  a  smaller  manufaeturing  teehnique  is  to  shrink  the  die, 
even  though  there  may  still  be  more  eireuits  paeked.  Therefore,  the  question  is  how 
effieiently  the  extra  gates  ean  be  used.  In  the  past,  this  extra  silieon  was  used  first  for 
deeper  pipelines  with  more  eomplex  predietion  eireuits,  then  for  more  on-ehip  eaehe 
memory,  and  finally  to  add  more  eores.  A  multieore  proeessor  is  a  single  ehip  eontaining 
multiple  proeessing  engines  that  may  share  eommon  resourees,  sueh  as  eaehe  memory. 
Eaeh  of  these  teehniques  ended  at  a  point  of  diminishing  returns.  The  only  hope  is  that 
the  addition  of  more  eores  eould  yield  the  extra  eomputational  power  needed,  but 
effieient  multieore  programming  is  a  ehallenge.  The  traditional  programming  teehniques 
eould  not  be  used  to  take  advantage  of  the  multieore  GPPs  and  more  time  will  be  needed 
in  order  to  have  mature  parallel  programming. 

FPGAs  ean  use  the  extra  silieon  in  a  more  applieation-speeifie  way.  It  is  easier  to 
build  a  eireuit  that  uses  parallelism  than  to  write  a  similar  program  in  software.  Thus, 
while  teehnology  advanees  and  offers  higher  density  ehips,  it  is  always  possible  to  make 
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use  of  this  increased  amount  of  logic  in  FPGAs.  This  is  not  the  case  in  GPPs  [43].  The 
first  implementations  of  SDR  were  using  mostly  GPP  [44],  but  the  evolution  of  DSP 
made  them  less  practical.  Nowadays,  the  GPP  is  extensively  used  for  another  purpose;  it 
is  responsible  for  the  network  protocols  and  any  application  used.  The  new  generation  of 
FPGAs  is  also  offering  IP  cores  of  soft  processors,  eliminating  in  many  cases,  the  need  of 
an  extra  GPP.  Following  this  philosophy,  ACTED  offers  the  Cortex-Ml  [45]  and  ARM  7 
and  Xilinx  offers  the  PowerPC  440  [46]  soft  processors. 

Although  ARM  processors  seem  to  dominate  the  GPP  market  for  portable 
devises,  the  STI’s  (Sony,  Toshiba,  IBM)  Cell  processor  and  other  completely  new 
products  launched  this  year  also  exist  but  they  have  not  been  given  time  to  prove  their 
capabilities.  Some  of  these  products  include  Intel’s  Atom™  [47]  and  Via’s  Nano™ 
processors [48]. 

D.  FPGA  VS.  DSP 

DSP  chips  provide  good  performance  and  usually  offer  an  easier  development 
process,  which  also  means  quicker  time  to  market.  Some  modern  DSP  chips  are  very 
capable  and  they  sometimes  feature  on-chip  Viterbi  and  matrix  multiplier  coprocessors 
and  a  plethora  of  connectivity  and  memory  options  [56].  The  first  in  line  is  Texas 
Instrument®  TMS320C6455,  which  has  a  1.2  GHz  clock  and  is  engraved  with  90nm 
process  technology  and  executes  up  to  9600  million  instructions  per  second  (MIPS)  [49]. 
Another  high  end  chip  is  Freescale’s  MSC8I44  multicore  DSP  [50],  which  can  be 
accompanied  by  the  MSBA8100,  an  accelerator  for  Fourier  transforms  and  channel 
decoding,  which  is  especially  made  to  accelerate  3G-LTE,  WiMAX  and  3GPP-R6.  Each 
of  the  four  cores  of  MSC8144  runs  at  1  Ghz  and  was  the  best  performer  among  DSP 
chips  on  some  tests  made  by  Berkeley  Design  Technology  [51]. 

High  power  consumption  is  another  drawback  of  DSP  devices.  Mobility  is  of 
much  concern  nowadays  including  portable  wireless  devices,  and  in  the  future,  this 
demand  will  likely  increase.  Many  Bluetooth  and  WiPi  products  exist  and  the  demand  for 
mobile  communication  will  grow,  requiring  even  more  efficient  DSP  techniques.  Much 
research  has  shown  that  DSP  chips  consume  much  more  energy  than  ASICs  or  even 
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FPGAs.  In  “BDTi  Focus  Report:  FPGAs  for  DSP,  Second  Edition,”  [52]  there  is  an 
extensive  analysis  and  comparison  of  the  consumption  of  different  kinds  of  chips,  leading 
to  the  conclusion  that  the  parallelism  inherent  in  FPGAs  can  save  much  energy  compared 
to  the  same  number  of  DSP  cores.  As  stated  in  “FPGAs  vs.  DSPs:  A  look  at  the 
unanswered  questions”  which  is  an  abstract  of  the  BDTi  report,  it  is  mentioned  that  in 
DSPs  only  a  small  fraction  of  the  silicon  real  estate  is  devoted  to  the  actual  calculations 
while  most  is  assigned  to  the  transportation  of  data  to  the  correct  place.  Therefore,  they 
conclude  that  “it  would  be  a  mistake  to  assume  that  FPGAs  are  inherently  less  energy 
efficient  than  DSPs”.  Then,  an  example  exemplifies  that  even  though  the  raw  power 
consumption  of  a  FPGA  is  much  higher  than  a  comparable  DSP,  the  FPGA  can  handle 
many  more  channels  per  chip,  leading  to  only  a  fraction  of  the  power  consumption  per 
channel  of  the  DSP. 

DSP  performance  cannot  easily  compete  with  either  ASICs  or  FPGAs  and  the 
main  reason  is  that  DSP  chips  are  serial  processors,  even  if  many  of  the  DSP  applications 
can  widely  benefit  from  the  inherent  parallel  structure  of  both  ASICs  and  FPGAs. 

According  to  Douang  Phanthavong  [53],  FPGAs  that  have  been  optimized  to 
perform  a  digital-signal  processing  task,  will  run  anywhere  from  10  to  more  than  1000 
times  faster  than  a  stand-alone  DSP  device.  This  is  the  main  reason  that  modern  DSPs 
include  special  coprocessors.  Especially  for  communication  applications,  Viterbi  and 
Turbo  code  coprocessors  have  been  developed,  suppressing  the  need  of  using  multiple 
DSPs  [37].  However,  not  all  needs  can  be  satisfied  by  special  coprocessors  and  the 
unlimited  customization  that  EPGAs  offer  can  match  the  needs  in  a  more  favorable  way 
[53]. 

An  example  from  “Embedding  EPGAs  in  DSP-driven  Software  Defined  Radio 
applications”  by  Rodger  Hosking  and  Richard  Kuenzler  examines  the  case  of  a  wideband 
Einite  Impulse  Response  (EIR)  digital  filter.  Assuming  that  this  filter  requires  32 
Multiply  Accumulate  (MAC)  operations  in  every  clock  cycle,  it  is  easy  to  incorporate  32 
MACs  in  an  EPGA  design,  which  are  hardwired,  yielding  greater  speed.  In  contrast, 
DSPs  usually  incorporate  only  two  multipliers  and  will  be  considerably  slower.  Notice 

that  a  hardware  MAC  can  be  clocked  up  to  500  MHz  [54]. 
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E,  FPGA  VS.  ASIC 


ASICs  are  hardwired  [32],  eustom  ehips  designed  for  a  specifie  application 
instead  of  working  as  a  GPP.  They  are  hard  to  compete  with  any  other  type  because  they 
encompass  all  the  most  wanted  characteristics.  At  the  same  time,  ASICs  can  achieve 
energy  efficiency,  low  cost  and  high  performance.  They  have  only  one  drawback.  They 
ask  for  all  the  design  effort  and  most  of  the  expenditure  to  be  made  upfront  and  no 
changes  can  be  made  without  paying  again  all  this  costs. 

ASICs  emerged  in  the  place  of  DSPs  offering  better  performance,  power  and  cost 
compared  to  the  latter,  because  they  could  use  the  silicon  estate  more  efficiently.  In 
ASICs,  only  the  compulsory  interconnections  and  the  exact  number  of  logic  cells  exist. 
Thus,  in  high  volume  the  price  per  unit  is  definitely  cheaper  than  any  other  chip  [55].  As 
stated,  the  DSPs  do  have  a  fixed  cost  regardless  of  the  purchased  quantity.  In  addition, 
FPGAs  cannot  use  their  silicon  as  efficiently  because  of  the  interconnection  overhead. 
However,  this  aspect  only  accounts  for  one  side  of  the  coin.  In  order  to  produce  ASICs,  it 
is  necessary  to  first  print  the  corresponding  masks.  This  cost  is  included  in  the 
nonrecurring  engineering  (NRE)  costs  and  make  the  production  of  small  quantities 
prohibitive  [56]. 

Having  a  perfectly  matched  ASIC  to  a  specific  application  does  not  always  solve 
all  the  problems.  Even  if  this  approach  is  guaranteed  to  achieve  the  maximum  speed 
along  with  minimum  resource  consumption,  it  demands  much  time  for  the  initial  design, 
which  increases  exponentially  with  its  complexity.  Nowadays,  with  very  short  products’ 
life,  even  a  delay  of  some  weeks  may  force  a  product  to  lose  the  market  window  with 
catastrophic  results  in  the  sale  sector. 

Upgradability  and  reprogramability  are  additional  characteristics  missing  in 
ASICs.  Opposed  to  FPGAs,  ASICs  must  be  designed  and  manufactured  to  exactly  the 
specifications  imposed.  If  not,  most  of  the  NRE  must  be  paid  again  plus  any  extra 
expenses  to  retire  the  defective  products  from  the  market.  This  process  is  expensive  and 
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undesirable.  On  the  other  hand,  FPGAs  ean  be  even  shipped  with  bugs  and  then  be 
eorreeted  by  a  simple  download  (if  eorreet  measurements  are  taken  for  that  purpose).  For 
DSPs,  whieh  work  based  on  software,  reprogramability  is  also  viable. 

In  eonelusion,  it  is  beeoming  harder  and  harder  to  find  deviees  that  have  the 
luxury  of  being  time  to  market  insensitive  and  high  volume  eost  effeetive.  In  addition, 
even  in  that  ease,  there  is  a  plaee  in  the  market  for  FPGAs  to  equip  the  first  versions  of  a 
new  produet,  until  the  design  is  proven  to  be  robust.  FPGAs  do  not  outperform  ASICs 
neither  in  terms  of  speed  nor  in  power  eonsumption.  Nevertheless,  this  margin  is  not  as 
signifieant  as  that  between  DSPs  and  FPGAs. 

An  optimized  implementation  in  FPGAs  ean  be  almost  as  good  as  one  in  ASICs, 
additionally  offering  the  ability  for  future  upgrades  and  the  flexibility  of  a  System  on  a 
Chip.  This  flexibility  is  aequired  at  the  expense  of  priee  per  unit.  Aeeounting  that  FPGAs 
do  not  demand  signifieant  NRE  eosts,  there  is  a  plaee  for  them  in  the  market.  It  is  hard  to 
approximate  the  quantity  that  is  the  turning  point  to  the  eurve.  As  the  engraving  proeess 
shrinks,  the  expenses  assoeiated  with  the  manufaeturing  of  fabrieation  units  for  ASICs 
goes  up.  In  order  to  keep  the  manufaeturing  eost  of  ASICs  low,  they  should  be  engraved 
using  larger  seale  making  the  eomparison  between  ASICs  and  FPGAs  even  vaguer  [57]. 

F,  DSP-ENHANCE  D  FPGAS 

It  has  been  seen  that  eaeh  eategory  of  ehips  has  its  own  virtues  and  shorteomings. 
In  order  to  inerease  eapabilities,  eompanies  have  tried  to  eombine  features  of  different 
elasses  in  one  ehip.  Following  this  logie,  new  FPGA  models  have  embedded  DSP  eells 
and  the  eompanies  have  ereated  synthesizable  Intelleetual  Property  (IP)  eores  to 
aeeompany  their  ehips  [56]. 

Regarding  DSP  embedded  eapabilities,  both  Altera's  Stratix  I  family  and  Xilinx's 
Virtex-II  family  already  offer  some  arehiteetural  enhaneements  to  inerease  DSP 
effieieney.  These  DSP  eapabilities  were  provided  by  hard-wired  on-ehip  multipliers 
intended  to  offer  aeeeleration  to  operations  like  multiply-aeeumulate  (MAC)  or  multiply- 
addition  (MADD),  whieh  is  very  eommon  in  DSP  algorithms  like  the  Fast  Fourier 
Transform  (FFT)  and  Finite  Impulse  Response  (FIR)  filters.  The  eore  of  a  typieal  DSP 
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block  consists  of  a  multiplier  followed  by  an  adder  and  many  registers  at  the  inputs  and 
outputs  of  the  eell  (Figure  24)  [58].  DSP  cells  can  also  be  cascaded,  whieh  adds  more 
flexibility  in  applieations  like  FIR  filters. 
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Figure  24  Internal  Strueture  of  a  DSP48E  eell.  (From;  [58]) 

As  an  example,  in  their  newest  chips,  Virtex-5  Xilinx  is  offering  the  DSP48E 
slice,  whieh  is  a  25-bit  by  18-bit  multiplier  along  with  a  48-bit  aeeumulator.  This  offers 
impressive  performance  including  speed  and  power  while  using  little  silicon  real  estate. 
The  number  of  DSP  slices  is  limited  to  a  number  between  32  and  192,  but  still,  the  DSP 
aeeeleration  they  offer  is  notieeable.  Eor  even  greater  convenienee,  the  many  library 
bloeks  can  be  optionally  implemented  using  these  DSP  slices,  yielding  very  fast  designs. 
Eigure  24  shows  the  block  diagram  DSP48E  [59]. 

Regarding  soft  cores,  Aetel  delivers  synthesizable  versions  of  ARMY  (CoreMPV) 
and  ARM  Cortex  (Ml,  M3)  free  of  charge.  Advaneed  RISC  Machine  (ARM)  proeessors 
in  their  hardwired  form  are  proeessors  mainly  developed  for  mobile  devices  with  a 
Redueed  Instruction  Set  Computing  (RISC)  eore.  The  mother  eompany  that  develops  the 
new  ARM  processors  only  licenses  them  without  manufaeturing  them  on  its  own.  Every 
respective  vendor  in  the  eleetronics  sector  owns  at  least  one  lieense.  ARM’s  new  line  of 
products  also  includes  synthesizable  cores,  like  the  older  ARMY  and  the  new  Cortex  that 
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can  just  be  downloaded  to  an  FPGA.  Thus,  inside  an  FPGA  it  is  possible  to  have  a  GPP 
plus  some  silieon  left  for  other  designs.  The  embedded  eores  ean  run  different  real  time 
operating  systems  (RTOS)  and  support  modern  eonneetivity  protoeols,  like  Gigabit 
Ethernet  and  RapidIO  [60]. 

Xilinx  went  one  step  further  by  produeing  FPGAs  with  built-in  PowerPC®  440 
bloeks.  In  some  models  of  Virtex-5,  Xilinx  even  ineludes  two  PowerPC  eores  [61].  Altera 
is  not  only  offering  its  own  RISC  version  (Nios®  II)  along  with  ARM  Cortex  Ml,  but 
reeently  updated  with  the  Freeseale’s  32bit  VI  ColdFire  [60].  Freeseale  is  another 
manufaeturer  that  produees  mieroproeessors  for  embedded  deviees  and  its  ColdFire  ehip 
is  a  68k  series  mieroproeessor. 

Regardless  of  the  previous  advaneements,  some  operations  still  exist  that  are  not 
suitable  for  FPGAs,  like  division  by  a  number  not  a  power  of  2  and  espeeially  between 
floating  point  numbers  [62].  Sometimes,  these  operations  are  implemented  with  look-up 
tables,  but  there  are  some  shorteomings  that  are  easier  to  implement  in  DSP  ehips.  For 
this  reason,  modern  platforms  foree  DSP  and  FPGAs  to  eoexist  in  order  to  aehieve 
maximum  performanee. 

G.  THE  ROL  E  OF  FPGAS  IN  SDR  -  HOW  TO  COMBINE  DSP-FPGA 

COPROCESSOR 

A  goal  of  SDR  is  the  ability  for  a  single  transeeiver  to  eonform  to  multiple 
different  air  interfaees  and  modulation  formats.  The  design  that  would  aecommodate  all 
present  and  future  needs  of  a  SDR  produet  must  be  flexible,  sealable  and  of  high 
performanee.  The  use  of  many  DSP  proeessors  in  parallel  eonfiguration  is  not  praetieal 
beeause  of  eomplexity  and  power  eonsumption.  Furthermore,  there  should  be  a  margin 
between  the  performanee  of  the  proeessor  and  the  eurrent  needs  in  order  to  aeeommodate 
any  future  demand.  An  example  granted  from  the  video  eompression  area  of  mobile 
deviees  is  the  eomparisons  of  the  standard  MPEG-2  with  the  newest  H.264.  The 
algorithmie  eomplexity  of  high  definition  resolution  H.264  is  three  times  that  of  standard 
definition  resolution  MPEG-2  video  eompression,  whieh  is  translated  on  an  order  of 
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magnitude  increase  in  system  performance.  Thus,  there  should  be  enough  computational 
power  even  for  future  standards.  Otherwise,  no  update  can  be  performed,  thus  reducing 
versatility  [63]. 

This  reconfigurability  does  not  come  without  a  computational  cost  and 
complexity  cost,  because  analog  parts  cannot  be  used  extensively  anymore  and  digital 
circuits  do  not  always  have  the  bandwidth  to  support  wideband  communications.  As  it 
has  been  demonstrated,  one  family  of  chips  cannot  provide  all  the  characteristics  needed 
in  order  to  make  SDR  a  reality.  The  power  of  modern  FPGAs  offers  much  flexibility  and 
can  help  realize  the  SDR  concept.  Nevertheless,  in  order  to  combine  all  features  needed  a 
cross-chip  platform  is  necessary.  In  practice,  typical  SDR  platforms  include  all  three 
DSPs,  FPGAs  and  GPPs  to  deal  with  the  complexity,  cost  and  power  constraints. 

The  initial  use  of  FPGAs  as  mere  interconnecting  logic  between  external 
interfaces  and  computational  chips  (or  chips  in  the  system)  or  between  DSPs  and  GPPs, 
has  now  changed.  FPGAs  are  also  used  as  fabric  where  special  circuits  are  built,  in  cases 
where  speed  requires  implementation  of  these  DSP  functions  in  hardware.  Thus,  they  are 
used  as  coprocessors  to  either  DSPs  or  GPPs,  in  order  to  accelerate  some  functions  that 
are  frequently  used  or  could  benefit  from  a  parallel  structure.  Tools  provided  by  the 
manufacturers  of  the  FPGAs  make  the  mapping  from  high-level  languages  to  Hardware 
Description  Languages  (HDL)  easy  to  use  [64]. 

The  different  functions  that  usually  are  assigned  to  these  devices  are  illustrated  in 
Figure  25  [65]. 
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•  Low-SpeeO  Packet  Processing 

>  Complex  MAC  Layer  Protocol 

>  Network  Level  Protocols 

•  Wavelorm  Managment 

•  Tx  Packet  Construction 

•  Rx  Packet  Decode 

•  Wavelorm  Load 

>  Wavelorm  Execution  Control 


•  Modem  External  Interlace 

•  Down  Coversicn  to  Baseband 

>  Up  conversion  to  IF 

>  Signal  Filtering 

•  Sample  Rate  Decimation/tnterpolation 

>  High-Speed  Mod  and  Demod 

•  High-Speed  AGC 

•  High-Speed  FEC 

•  High-Speed  Packet  Processing 


Figure  1 

Example  architecture  splitting  SDR  tunctions  across  GPP.  DSP  and  FPGA. 


Medium-Speed  Timing 
Critical  Low-Speed  Signal  Filter 
Sample  Rate  Decimation 
Sample  Rate  mterpolalion 
Low-Speed  Mod  and  Demod 
Low-Speed  AGC 
Medium-Speed  FEC 
Medium-Speed  Packet  Processing 
Simple  MAC  Layer  Protocols 


Figure  25  Different  Functions  Assigned  to  GPPs,  FPGAs,  and  DSPs  (From;  [65]). 


An  FPGA  used  as  a  coprocessor  seems  to  yield  a  balanced  solution.  In  such  cases, 
the  DSP  code  must  be  partitioned  into  the  parts  that  will  be  executed  by  the  DSP 
processor  and  by  the  FPGA.  In  “Hybrid  FPGA/DSP  architecture:  the  optimal  solution” 
by  Jeffry  Milrod  [66],  the  author  mentions  that  the  FPGA  should  be  placed  close  to  the 
signal  I/O.  This  configuration  can  use  the  FPGA  as  a  reconfigurable  I/O  controller  in 
support  of  various  standards  (like  PCI  express,  GigabitEthemet  etc.).  Also,  it  solves  the 
bandwidth  problem  of  connections  between  fast  I/O  devices  and  the  core. 

The  general  guidelines  that  should  be  followed  during  the  design  of  such  systems 
are  described  in  [67]  and  include  the  folly  of  trying  to  transfer  a  code  previously  written 
for  a  DSP  platform  to  the  new  architecture.  The  serial,  sequential  logic  of  a  DSP  has 
nothing  to  do  with  the  parallel  logic  of  FPGA  designs.  Other  guidelines  refer  to  the  split 
of  tasks  executed  between  each  of  the  two  chips,  suggesting  that  the  control  part  should 
be  better  instantiated  in  the  FPGA,  because  many  soft  embedded  processors  are  offered 
for  FPGAs.  The  paper  also  refers  to  the  evaluation  of  different  choices  regarding 
intellectual  property  in  the  design.  While  producing  an  intellectual  property  is  more 

expensive,  time  to  market  may  force  its  purchase. 
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Another  point  of  great  importance  is  the  bandwidth  of  the  interconnection 
between  the  DSP  and  the  FPGA.  In  hybrid  architectures,  much  data  is  going  to  go  back 
and  forth  between  the  main  computational  elements,  depending  on  where  the 
computation  is  more  efficient.  Thus,  in  order  to  be  applicable  in  practice,  the  interfaces 
must  be  of  low  latency  and  fast  [68]. 

Texas  Instruments'  Small  Form  Factor  Software-Defined  Radio  is  an 
implementation  example  that  uses  Xilinx  Virtex-4  SX-35  FPGA,  TTs  TMS320CC64x,  a 
600  MHz  chip,  DSP  and  an  ARM926EJ-S  processor.  As  expected,  the  DSP  undertakes 
the  signal  processing  load,  while  the  GPP  supports  network  and  application  processing 
and  the  Virtex-4  is  used  for  modem  co-processing  and  acceleration  functions.  The 
manufacturer  claims  the  existence  of  both  a  DSP  and  ARM  on  a  single  chip  has  the 
benefit  of  reduced  system  space  and  cost  [69]. 

H.  BEYOND  THESE  TECHNOLOGIES,  WHAT  NEXT? 

Some  new  technologies  advertise  a  combination  of  both  FPGA  and  ASIC 
benefits.  Usually  the  chips  implementing  these  technologies  offer  partial  reconfigurability 
keeping  other  parts  hard- wired,  placing  themselves  in  between  the  two  extremes.  Others 
are  highly  parallel  devices  that  incorporate  an  internal  structure  to  implement  the  difficult 
problem  of  massive  parallelism  efficiently.  Nevertheless,  none  of  these  technologies  have 
gained  a  dominant  position  in  the  market  [70]. 

The  eASIC  is  promoting  the  so  called  Second  Generation  Structured  ASICs, 
Nextreme2™  [71].  It  is  a  45  nm  design  that  belongs  to  the  category  of  ASIC-FPGA 
Hybrids.  The  exact  way  this  is  implemented  is  very  well  illustrated  in  Figure  26  [72].  The 
specific  choice  of  using  routing  via  single  mask  eliminates  the  need  for  a  very  large 
overhead  of  SRAM  elements  that  the  flexible  routing  would  need.  In  this  way,  the  current 
consumption  is  reduced,  keeping  in  mind  the  current  leaking  that  SRAM  elements 
encounter.  The  cost  per  chip  is  also  suppressed  while  keeping  the  mask  charges  very  low, 
which  in  turn  removes  the  minimum  quantity  constraints  that  conventional  ASICs  would 
have. 
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Mask  Customized  SRAM  Programmed 

Routing  (Via)  Logic  (LUT) 


Figure  26  Illustration  of  the  concept  behind  cASIC’s  structured  ASICs  (From;  [72]). 

On  the  other  hand,  Nextreme  retains  the  internal  structure  of  cells,  called  an  eCell, 
the  same  as  FPGAs.  This  allows  some  of  FPGA’s  flexibility  and  reconfigurability.  The 
company  advertises  the  cost  of  the  development  tools  as  well  as  time  to  market  similar  to 
that  of  FPGAs.  Nextreme  can  host  a  plethora  of  soft  cores,  including  ARM  926EJ,  and 
Tensilica  Diamond  Standard  Processors,  which  are  mainly  for  audio  processing. 

There  are  two  device  options,  one  for  prototyping  and  one  for  mass  production. 
The  method  used  to  customize  the  interconnections  in  this  product  is  maskless 
lithography  and  is  called  the  Direct-write  e-Beam.  This  technology  uses  an  electron  beam 
to  write  directly  on  the  wafer.  The  paper  [73]  on  the  company’s  website  describes  this 
technology. 

The  PicoChip’s  picoArray™  is  another  architecture  that  has  managed  to 
differentiate  from  the  competition.  It  consists  of  a  massively  parallel  design  where  308 
tiled  processors  are  connected  in  a  2D  grid.  These  16-bit  Harvard  processors  each  have  a 
small  local  memory  and  each  one  runs  its  own  process.  For  proper  interconnection,  they 
are  all  attached  to  a  network  of  32-bit  buses,  the  picobuses,  and  programmable  bus 
switches.  Multiple  picoArray  cores  can  be  used  in  a  parallel  structure  to  give  even  more 
computational  power.  In  each  picoArray,  multiple  functional  acceleration  units  (FAU) 
exist  for  speeding  some  specific  tasks,  like  Advanced  Encryption  Standard  (AES) 

encryption.  Some  models  even  have  an  embedded  ARM-9  processor.  They  have  proved 
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to  have  very  good  processing  capabilities  in  known  DSP  calculations,  like  the  FFT  or 
IFFT,  and  error  control  coding  and  decoding.  This  chip  has  been  deployed  in  wireless 
infrastructure  [74],  See  Figure  27  [75]. 
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Figure  27  PicoArray  Concept  (From;  [75]). 

I.  LIMIT  ATI  ONS 

The  question  that  arises  naturally  is  if  the  ultimate  goal  of  software  radio  can  be 
achieved.  This  goal  is  to  build  devices  that  can  handle  every  possible  modulation  by  just 
loading  the  proper  software.  As  described  in  [76],  some  parts  of  the  radio  are  not  even 
close  to  digital  implementation  due  to  cost  or  space.  The  main  limitation  arises  from 
Digital  to  Analog  Converters  that  are  not  fast  enough  for  most  Radio  Frequencies  (RF). 
The  solution  usually  used  is  to  perform  analog  to  digital  (Rx)  and  digital  to  analog  (Tx) 
conversion  at  a  low  intermediate  frequency  (IF).  The  conversion  between  the  IF  and  RF 
is  usually  performed  using  analog  hardware.  The  advances  made  in  that  domain  do  not 
seem  capable  of  changing  that  in  the  near  future. 
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APPENDIX  B.  IN  DEPTH  PARAMETER  ANALYSIS  OF  BFSK 

TRANSCEIVER  DESIGN 


In  this  Appendix,  a  description  of  the  specific  function  of  each  block  and  the 
settings  of  its  parameters  can  be  found.  The  reading  of  this  Appendix  in  parallel  with 
Chapter  IV  is  proposed  for  someone  not  familiar  with  the  Xilinx  environment  in  order  to 
acquire  a  better  overview  of  the  meaning  of  each  block.  It  is  also  useful  as  a  reference 
guide  to  someone  that  would  like  to  reproduce  the  circuit  or  use  it  as  a  platform  to 
extended  to  a  different  modulation  scheme. 

A.  TRANSMITTER  (TOP  LEVEL) 
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‘Note :  from  the  128  bits,  8  are  the  preamble  and  60  (doubled  by  the  convolver  )  are  the  actual  info  bits  . 
Channel  bits  are  defined  the  preamble  plus  the  the  output  bits  of  the  convolutional  encoder 
Exception  is  the  last  packet  includes  only  58  info  bits  in  order  to  accommodate  4  trail  bits 


To  Workspace 


Figure  28  Transmitter’s  schematic  diagram  designed  in  Simulink/Sysgen  environment. 

•  System  Generator:  its  existence  is  compulsory  to  every  design.  It  defines 
the  type  of  target  FPGA  the  FPGA  clock  period,  and  other  key  parameters. 

Key  parameters: 

Part:  Virtex-4  xc4vlx25-10sB63.This  is  the  target  FPGA. 

FPGA  clock  period  (ns):  10. 
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Clock  pin  location:  A8.  This  choice  depends  from  the  aetual  pin  that  the 
mainboard  provides  the  eloek  pulses  and  it  is  found  in  the  board’s  manual. 
This  parameter  is  erueial  for  proper  funetion  of  the  design  on  the  FPGA 
and  post  plaee  and  route  simulation  is  not  feasible  if  this  parameter  is  not 
defined. 

Simulink  system  period  (see):  T/128.  This  number  is  defined  by  the  faster 
eomponent  of  the  design.  The  value  T  eorresponds  to  the  desired 
information  bit  period  and  equals  128*10  ns.  The  simulation  of  the  design 
is  made  with  time  steps  of  T/128  see,  as  defined  in  Simulation  Tab  - 
>Configuration  Parameters  in  the  Simulink  window.  In  explieit,  when  a 
bloek  is  defined  to  work  at  a  sample  period  T,  it  yields  an  output  onee 
every  128  Simulink  periods.  In  this  ease,  the  ‘System  Generator’  bloek 
only  defines  the  basis  for  the  other  bloeks.  When  a  bloek  has  sampling 
period  T,  it  yields  output  128  times  slower  than  the  referenee  period. 

•  Resource  Estimator:  ft  is  a  block  that  provides  an  estimate  regarding  the 
FPGA  resources  that  are  required  to  build  the  circuit. 

•  From  Workspace:  inserts  variables  from  the  Matlab  Workspace. 

Key  Parameters: 

Data:  [(0:1901)*T;1  a  0]'  where  a  =rand(l,1900)<.5  is  executed  in  the 
Matlab  Command  Window. 

•  Gateway  In,  Reset,  Enable :  converts  the  input  to  Xilinx  fixed  point  type. 
The  part  of  the  design  that  is  after  this  block  is  synthesized  by  System 
Generator.  The  block  itself  becomes  a  top  level  input  port. 

Key  Parameters: 

Output  type:  Unsigned  consisted  of  1  bit.  The  input  is  0  or  1 . 

Sample  period:  T.  The  block  is  working  at  the  Simulink  simulation  period 
and  128  times  slower  than  the  reference. 

•  Sample  Timel :  illustrates  the  simulation  period  concept  discussed  in  the 
System  Generator’  block,  ft  uses  a  display  block  to  report  the  normalized 
sample  period  value.  A  value  of  128  that  is  shown  in  ‘Display  1’  means 
that  the  input  is  128  times  slower  than  the  reference.  For  the  specific  case 
that  the  ‘System  Generator’  block  has  a  value  of  T/128,  it  means  that  the 
previous  block  of  the  ‘Sample  Time  1’  has  a  sample  period  of  T. 

•  T3:  terminates  the  its  input  to  avoid  warning  messages,  ft  also  means  that 
its  input  is  not  considered  useful  in  this  design.  In  this  case,  the  ‘TX  ready’ 
signal  proved  not  to  help  the  problem  of  the  gap  between  the  preamble  and 
the  encoded  bits  as  discussed  in  Section  C.l,  Chapter  V. 

•  Muxl:  multiplexer  with  select  (sel)  of  type  unsigned  and  configurable 
number  of  data  bus  inputs  (d0,dl).  The  Enable  port  (en),  which  is  optional, 
forces  the  latency  to  be  more  than  1 .  The  exact  value  of  latency  is  also 
shown  on  the  block’s  figure  as  the  negative  exponent  of  the  z  symbol. 
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•  Delayl:  Multiple  delays  are  spread  over  the  design.  Sometimes  their  role 
is  to  ensure  timely  propagation  of  the  signals,  other  times  they  take  care  of 
the  synchronization  of  different  branches  of  the  design.  This  is  one  case 
that  delay  should  not  have  been  used,  because  the  scopes  are  not 
synthesized  in  contrast  to  the  delay  blocks.  The  proper  way  is  to  use  a 
‘Gateway  Out’  block  and  Simulink’s  delays  right  after. 

•  Sample  Time,  Sample  Time2:  As  discussed  in  ‘Sample  Timel.’ 

•  Gateway  Out :  Opposite  functionality  than  ‘Gateway  In.’  It  converts  the 
Xilinx  fixed  point  input  to  a  Simulink  compatible  type.  These  blocks  are 
synthesized  in  top  level  output  ports. 

Key  parameters: 

Input/Output  Buffers  (lOB)  pad  locations: 

{'U9VV9VV10VVir,'P19VU12'}.  These  are  the  output  pins  from  Most 
Significant  Bit  to  Least  Significant  Bit.  The  number  of  the  pins  is  equal  to 
the  number  of  the  output  bits. 

•  To  Workspace  :  Stores  the  values  presented  at  its  input  as  a  Matlab 
variable  for  further  analysis  in  the  Matlab  Environment.  In  this  case,  the 
data  will  be  forwarded  to  the  receiver’s  input. 

Key  parameters: 

Variable  name:  simout.  This  is  the  name  of  the  variable  that  will  store  the 
output  values  of  the  Transmitter.  They  can  be  recovered  through  the  path 
simout. signals. values  because  simout  is  a  structure  that  saves  other 
information  like  the  time  that  corresponds  to  the  respective  value. 


B.  PREAM  BLE  SUBSYSTEM 
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Figure  29  Preamble  Subsystem. 
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•  Counter  1:  A  counter  should  be  thought  of  as  a  elock  with  an  adder.  Its 
output  can  be  usually  used  by  comparators  to  enable  or  disable  signals.  In 
this  ease,  the  eounter  is  also  used  to  direetly  provide  the  next  address  to 
the  ‘ROM’  bloek. 

Key  parameters: 

Count  to  value:  127.  This  depends  on  the  paeket  size  and  eorresponds  to  a 
paeket  length  of  128. 

Number  of  bits:  7.  This  number  must  merely  aeeommodate  the  maximum 
value  of  the  counter.  In  this  ease,  seven  bits  are  enough  to  represent  the 
maximum  value  of  127. 

Explicit  Sample  Period:  T/2.  In  this  ease,  the  preamble  has  bit  period  equal 
to  the  eneoded  bit  period.  For  the  speeific  eneoder  of  ehoiee,  this  means 
that  the  encoded  bits  should  have  half  the  period  of  the  message  bits. 
Reeall  that  the  ‘Gateway  In’  has  sample  period  of  T. 

•  Convertl:  Translate  the  input  to  a  desired  output  type.  The  need  that 
forees  its  use  is  that  the  ‘ROM’  bloek  does  not  aeeept  addresses  that  are 
not  compatible  with  its  depth. 

Key  parameters: 

Output  preeision:  3.  The  7  bit  output  value  of  the  ‘Counter  I’  must  be 
converted  to  3  bits  input  to  the  ‘ROM.’ 

Overflow:  Saturate.  Does  not  have  any  speeific  impact  to  the  performance 
of  the  design,  just  makes  it  easier  to  see  the  output  plots  in  ‘Scopel.’  The 
‘ROM’  output  will  be  always  the  last  bit  of  the  sequence  while  the  counter 
output  will  be  more  than  seven. 

•  ROM:  It  is  a  single  port  read-only  memory  (ROM).  The  preamble  is 
stored  in  this  memory,  given  that  it  does  not  change  over  time. 

Key  parameters: 

Depth:  8. 

Initial  value  vector:  [lOIOIOOI].  This  is  the  preamble  sequence. 
Number  of  bits  (Output  Preeision):  I 

•  Relation  1:  It  is  a  eomparator  that  ean  support  a  plethora  of  different 
comparisons.  Here,  the  output  of  the  ‘Counterl’  is  eompared  with  a 
constant  number  to  determine  if  the  preamble  or  the  eneoded  message  bits 
should  be  transmitted.  Whenever  the  ‘Counterl’  is  less  or  equal  to  7,  the 
preamble  is  transmitted,  otherwise  the  eneoded  sequenee  is  selected. 

Key  parameters: 

Comparison:  a>b. 

•  Delayl,  Delav2,  Delav4  :  delays  that  had  been  used  for  synchronization 
troubleshooting  purposes.  Their  aetual  value  is  set  to  zero  and  do  not 
affeet  the  design  as  discussed  in  Seetion  C.l  in  Chapter  V. 
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c. 


DATA  INPUT  SUBSYSTEM 


Figure  30  Data  Input  Subsystem. 

•  Convolutional  Encode  r  v6  0  :  is  an  encoder  that  uses  a  convolutional 
code.  The  decoder  that  matches  the  convolutional  encoder  is  the  Viterbi 
decoder.  Encoders  are  provided  by  Xilinx  as  free  to  use  IP  blocks, 
although  it  is  not  the  same  with  the  decoders.  In  digital  communications, 
encoding  is  used  for  forward  error  correction.  In  this  design,  the  encoding 
is  applied  to  the  whole  message  sequence,  and  not  in  a  per  packet  basis.  A 

generic  diagram  of  a  rate  ^  =  “  code  is  provided  in  Figure  31  to  grant 

some  further  insight  and  a  more  detailed  description  is  provided  in  Section 
B,  Chapter  II. 


Figure  31  A  block  diagram  of  a  convolutional  encoder.  (From;  [13]). 
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Key  parameters: 

Constraint  length:  3.  This  means  that  the  shift  register  of  the  eneoder  has 
two  flip-flops. 

Convolutional  eode  array  (oetal):  [7  5].  This  eode  is  the  one  proposed  by 
Xilinx  for  this  speeifie  eonstraint  length.  This  eode  means  that  the  first 
output  braneh  of  the  eonvolutional  eneoder  (data_out_v(0)  in  Figure  31)  is 
adding  modulo  2  the  values  stored  in  all  flip-flops  and  the  input  value 
(111b),  and  the  seeond  braneh(data_out_v(l)  in  Figure  31)  is  using  only 
the  values  of  the  last  flip-flop  to  the  right  and  the  input  value  (101b). 

•  Concat:  this  bloek  eoneatenates  the  two  inputs  into  one  word.  The 
ultimate  goal  using  this  bloek  is  to  multiplex  the  two  output  streams  of 
eonvolutional  eneoder  into  one  stream. 

•  Parallel  to  Serial  :  This  bloek  eomplete  the  time  division  multiplexing 
started  by  ‘Coneat’  bloek.  Every  word  at  the  input  is  broken  into  separate 
bits  and  is  sent  serially  to  the  output. 

Key  parameters: 

Output  order:  Most  signifieant  word  first 
Type  (Output  Preeision):  Unsigned 
Number  of  bits:  1 

Note:  Both  the  ‘Coneat’  and  ‘Parallel  to  Serial’  bloeks  eould  have  been 
replaeed  by  a  ‘Time  Division  Multiplexer’  bloek  where  no  extra 
parameters  are  needed,  exeept  the  number  of  inputs. 

•  Gateway  Outl :  It  eonverts  the  Xilinx  fixed  point  input  to  a  Simulink 
eompatible  type.  It  drives  the  ‘Seopel’  and  ‘To  Workspaeel’  bloeks. 

Key  parameters: 

Translate  into  output  port:  Disable.  This  bloek  is  not  an  instanee  of  an 
output  port.  An  output  pin  is  not  assigned  to  this  port  during  synthesis. 

•  To  Worksp  acel:  Stores  the  values  presented  at  its  input  as  a  Matlab 
variable  for  further  analysis  in  the  Matlab  Environment.  In  this  ease,  the 
data  represents  the  message  sequenee  and  will  be  used  to  eompare  the 
reeeiver’s  output  and  eount  the  number  of  errors. 

Key  parameters: 

Variable  name:  pre_Viterbi_eneoder. 

•  Delay?,  Delayl :  Delay  used  to  align  the  plots  in  the  ‘Seopel.’  As  is  the 
ease  with  many  delays  that  drive  Seopes,  this  is  not  a  proper  way  to 
implement  a  delay  (see:  ‘Delayl  ’  in  Eigure  28  deseription  in  Appendix  B). 

•  FIFO:  It  is  a  First  In  First  Out  memory  queue.  The  input  values  engage 
next  available  memory  loeation  in  the  memory  queue.  This  funetion  is 
permitted  whenever  write  enable  (we)  signal  is  high,  otherwise  the  input 
data  are  disearded.  In  this  ease,  ‘we’  is  always  high  (‘Constantl’)  allowing 
the  eneoded  message  bits  to  be  saved  in  the  memory,  even  while  the 
preamble  sequenee  is  transmitted.  Read  enable  (re)  signal  is  defined  by  the 
Preamble  Subsystem  and  it  is  high  for  the  time  that  the  preamble  is  not 
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transmitted.  This  allows  the  encoded  bits  to  appear  to  the  input  of  the 
Modulation  Subsystem.  Outputs  ‘%fulT  and  ‘full’  are  not  used  and  are 
terminated  by  ‘T1,”T’  blocks.  ‘Empty’  signal  had  not  been  possible  to  be 
used  (see  also  Troubleshooting  of  the  Transmitter  in  Chapter  V)  and  is 
terminated  just  outside  the  Data  Input  Subsystem. 

•  Constantl:  Constant  of  value  1  that  keeps  the  ‘we’  signal  of  ‘FIFO’ 
always  high. 

•  Inverter:  bitwise  negation  the  Boolean  value  of  its  input. 

D.  MODULATION  SUBSYSTEM 


if  0  select  the  frequency 
of  input  d  0,  else  choose 
frequency  of  input  d  1 


Up  Sample  MCode  1  initialization  time 

State_machine  :  Waits  for  the  first  1  of  the  preamble  in  order  to  enable  the  mux 


Figure  32  Modulation  Subsystem. 

•  Mux:  same  description  as  ‘Muxl’  in  Figure  28  in  Appendix  B.  Same 
parameters. 

•  Delayl,  Delav2,  Delav3 :  Delays  that  ensure  the  timely  propagation  of  the 
signals.  It  is  often  necessary  to  insert  delays  between  adjacent  blocks. 

•  DPS  Compiler  v2  0  and  DPS  Compiler  v2  0  I _ :  Direct  Digital 

Synthesizers  (aka  Numerical  Controlled  Oscillators)  that  produce  a 
sinusoidal  output  using  a  lookup  table.  One  DDS  is  devoted  to  generate 
the  frequency  assigned  to  0  and  the  other  is  used  for  the  generation  of  I’s 
frequency.  The  output  width  is  by  default  6  bits,  all  placed  after  the  binary 
point.  Since  the  first  bit  is  the  sign  bit,  only  5  bits  are  for  magnitude.  This 
means  the  output  takes  values  from  -0.5  to  +0.5  and  not  from  -1  to  +1. 
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Key  parameters: 

DDS  clock  rate  (Mhz);  100.0.  This  number  must  be  at  least  twice  the 
output  frequency  and  at  most  500  for  the  Virtex-4  target  FPGA.  According 
to  Xilinx  technical  support  email,  the  frequency  of  the  DDS  clock  rate 
should  match  the  parameter  ‘FPGA  clock  period  (ns)’  in  Sysgen  token.  In 
this  case  FPGA  clock  period  is  10ns,  yielding  a  frequency  of  lOOMFlz. 
Frequency  resolution  (Hz):0.03. 

Output  Function;  Cosine 

Output  Frequency  array  (MFlz):  [45.0]  for  the  Is  and  [40.0]  for  the  Os. 
These  choices  depend  from  the  bit  duration  as  well.  The  bit  period  of  the 
encoded  bits  is  64*10  ns.  For  the  two  frequencies  to  be  orthogonal  as 
shown  in  Section  A  in  Chapter  II,  their  spacing  must  be  a  multiple  of  the 
bit  rate.  In  this  case  the  spacing  of  5MHz  is  3.2  times  the  bit  rate 

^  =  ^10*  and  the  frequencies  are  not  orthogonal. 

Explicit  period;  T/128.  The  decision  made  was  to  use  64  samples  of  the 
sinusoid  for  every  bit.  Given  that  the  bits  at  the  entrance  of  the  modulation 
Subsystem  are  at  a  period  of  14,  then  the  signal  that  would  have  the  proper 
period  is  one  with  a  value  set  to  T/128. 

Noise  Shaping  (Under  Advanced  tab);  Phase  dithering  [77].  This  choice 
should  improve  the  quality  of  the  sinusoidal  samples,  minimizing  the 
quantization  error. 

DSP48  Use  (Under  Implementation  Tab);  Maximal.  Given  that  DSP48 
cells  are  not  used  anywhere  else  in  the  transmitter,  some  of  them  can  be 
sacrificed  to  increase  the  performance  of  this  block. 

•  MCodel;  This  block  is  used  to  execute  simple  assigned  Matlab  functions. 
The  code  is  translated  in  VHDL  or  Verilog  language  during  the  synthesis 
phase.  It  only  supports  a  small  subset  of  the  MATLAB  language.  In  cases 
where  this  is  a  problem  Xilinx  AccelDSP  Synthesis  tool  can  be  used  to 
support  a  larger  set  of  Matlab  commands  and  to  create  custom  IP  blocks. 
MCode  block  only  supports  Xilinx  fixed-point  type. 

The  function  assigned  to  this  block  has  to  do  with  the  initialization  of  the  ‘Mux’ 
block.  In  order  to  avoid  the  propagation  of  undefined  signals  during  the  initiation 
phase,  a  state  machine  was  written  that  delays  the  enable  (en)  of  the  ‘Mux’  block 
until  the  first  bit  of  the  preamble  is  detected  at  the  input  of  the  Modulation 
Subsystem. 

Code: 


function  enable  =  state  machine (din, reset) 

define  the  state  variables.  They  will  be  retained  to  memory  between 
following  runs, 
persistent  state,  state  =  xl_state ( 0 , { xlUnsigned,  1,  0}); 

switch  state 
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case  0 

if  din  ==  1  %when  the  fisrt  bit  of  the  preamble  is  detected 
state  =  1;  %go  to  the  next  state  (enable  high) 

else 

state  =  0; 

end 

enable  =  xf  ix  ( { xlBoolean } ,  0 )  ;  %xfix()  translates  values... 

%to  a  Xilinx  fixed-point  type. 

case  1 

if  reset  ==xf ix ({ xlBoolean }, 1 )  %check  synchronous  reset 
state  =0; 

enable  =  xf ix ({ xlBoolean }, 0 ) ; 

else 

enable  =  xf  ix  ({ xlBoolean },  1 )  ;  %otherwise  stay  locked  to... 

%the  same  state  (enable  high) 

end 

otherwise 

state  =  0; 

enable  =  xf ix ({ xlBoolean }, 0 ) ; 

end 


•  Up  Sample:  up  samples  input  data  by  inserting  zeros  or  copies  of  previous 
sample.  It  is  used  to  make  the  sample  rate  of  the  ‘din’  and  ‘reset’  signals 
compatible.  System  Generator  does  not  accept  a  state  machine  with  inputs 
of  different  sampling  periods. 

Key  parameters: 

Copy  samples:  enabled. 

•  Shift:  This  block  generally  performs  a  left  or  right  shift  on  the  input.  The 
purpose  of  this  block  is  to  amplify  the  signal  before  transmission.  It  should 
be  noted  that  DBS  blocks  yield  values  from  -0.5  to  +0.5  and  not  from  -1  to 
+  1. 

Key  parameters: 

Shift  direction:  Left.  This  direction  is  amplifying  the  signal. 

Number  of  bits  (Shift  direction):  2.  This  number  is  amplifying  the  signal 
by  a  factor  of  four. 

Number  of  bits  (Output  type):6.  The  total  number  of  bits  is  not  changing, 
only  the  decimal  point. 

Binary  point  (Output  type):4.  The  change  of  the  position  of  the  binary 
point  reflects  the  shift  made  by  the  block.  The  previous  binary  point 
position  of  6  has  now  changed  to  4,  meaning  that  the  binary  point  shift  is 
two  places  to  the  right. 
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E.  RECEIVER  (TOP  LEVEL) 


Figure  33  Receiver’s  schematic  diagram  designed  in  Simulink/Sysgen  environment. 

•  System  Generator:  its  existence  is  compulsory  to  every  design.  It  defines 
the  type  of  target  FPGA,  the  FPGA  clock  period,  and  other  key 
parameters. 

Key  parameters: 

Part:  Virtex4  xc4vlx25-10sf363.This  is  the  target  FPGA. 

FPGA  clock  period  (ns):  10. 

Clock  pin  location:  A8.  This  choice  depends  from  the  actual  pin  that  the 
mainboard  provide  the  clock  pulses  and  it  is  found  in  the  board’s  manual. 
This  parameter  is  crucial  for  proper  function  of  the  design  on  the  FPGA 
and  post  place  and  route  simulation  is  not  lisible  if  this  parameter  is  not 
defined. 

Simulink  system  period  (sec):t,  where  t  =  10*10^^  defined  in  the  Matlab 
Workspace.  In  contrast  to  the  same  block  of  the  Transmitter,  here  the  input 
is  samples,  not  bits,  which  is  the  same  as  the  fastest  blocks  in  this  design. 
Given  that  the  time  step  of  the  Simulink  simulation  is  t,  one  Simulink 
simulation  period  is  equal  to  the  reference  period  of  the  Xilinx  model. 
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•  Resource  Estimator:  It  is  a  block  that  provides  an  estimate  regarding  the 
FPGA  resources  that  are  required  to  build  the  circuit. 

•  From  Workspace:  inserts  variables  from  Workspace.  Here,  this  variable 
is  the  stored  output  values  of  the  Transmitter. 

Key  parameters: 

Data:  [(l:length(simout.signals.values))*t;  [simout. signals. values]']'  where 
simout.signals.  values  represents  the  output  samples  of  the  Transmitter. 

•  Gateway  In:  converts  the  input  to  Xilinx  fixed  point  type.  The  part  of  the 
design  that  is  after  this  block  is  synthesized  by  System  Generator.  The 
bloek  itself  becomes  a  top  level  input  port. 

Key  parameters: 

Output  type:  Signed  consisting  of  six  bits  with  the  binary  point  at  the 
fourth  position  (from  the  left).  The  input  is  cosine  samples  amplified  by  a 
factor  of  four.  This  should  match  the  output  type  of  the  Transmitter. 
Sample  period:  t.  The  block  is  working  at  the  Simulink’s  simulation  period 
and  at  the  referenee  period  as  well. 

•  DPS  Compiler  v2  for  Is  and  DPS  Compiler  v2  for  Os  :  Direct  Digital 
Synthesizers  (aka  Numerical  Controlled  Oscillators)  that  produce  a 
sinusoidal  output  using  a  lookup  table.  One  DDS  is  devoted  to  generate 
the  frequency  assigned  to  0  and  the  other  is  used  for  the  generation  of  the 
I’s  frequency.  The  output  width  is  by  default  6  bits,  all  placed  after  the 
binary  point.  Since  the  first  bit  is  the  sign  bit,  only  5  bits  are  for 
magnitude.  This  means  the  output  takes  values  from  -0.5  to  +0.5  and  not 
from  -1  to  +1. 

Key  parameters: 

DDS  clock  rate  (Mhz):  100.0.  This  number  must  be  at  least  twice  the 
output  frequency  and  at  most  500  for  the  Virtex4  target  FPGA.  Here,  the 
value  matches  the  respective  value  of  the  transmitter. 

Frequeney  resolution  (Hz):0.03. 

Output  Function:  Cosine 

Output  Frequency  array  (MHz):  [45.0]  for  the  Is  and  [40.0]  for  the  Os. 
These  ehoices  should  match  the  values  defined  in  the  Transmitter. 

Explieit  period:  t.  The  block  is  working  at  the  Simulink  simulation  period 
and  at  the  reference  period  as  well. 

Noise  Shaping  (Under  Advanced  tab):  Phase  dithering.  This  choice  should 
improve  the  quality  of  the  sinusoidal  samples,  minimizing  the  quantization 
error. 

DSP48  Use  (Under  Implementation  Tab):  Maximal.  Given  that  DSP48 
cells  are  not  used  anywhere  else  in  the  receiver,  some  of  them  can  be 
sacrificed  to  increase  the  performance  of  this  block. 
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•  Mult,  Multi,  Multi,  Mult3  :  This  block  multiplies  its  two  inputs.  It 
should  be  noted  that  because  the  input  from  ‘Gateway  In’  takes  values 
from  -2  to  +2  and  the  input  from  DDS  is  from  -0.5  to  +0.5,  the  output  is 
between  ±1 . 

Key  parameters: 

Precision;  full. 

Use  embedded  multipliers  (Implementation);  Enabled.  This  ehoice  will 
make  use  of  the  DSP48  embedded  cells  to  execute  the  operation  faster.  It 
also  releases  generic  cells  that  can  be  used  for  a  different  purpose. 

•  Relational;  is  a  comparator  that  can  support  a  plethora  of  different 
comparisons.  Here,  it  compares  the  output  of  the  two  filters.  When  the 
output  of  the  Non-Coherent  Matched  Filter  for  Is  is  higher  than  the  output 
of  the  Non-Coherent  Matched  Filter  for  Os,  then  the  deeision  is  that  1  was 
transmitted. 

Key  parameters: 

Comparison;  a>b. 

Provide  enable  port;  enabled. 

Fatency;  1 .  Whenever  the  enable  input  is  chosen,  the  latency  must  be  one 
or  more. 

•  Constant!;  Provides  the  enable  signal  to  Relational. 

•  Delayl;  delay  measured  exactly  to  overcome  initialization  problems. 
Before  the  propagation  of  Constant2,  the  output  of  the  delay  is  zero. 

•  Gateway  Out ;  It  converts  the  Xilinx  fixed  point  input  to  a  Simulink 
compatible  type.  These  blocks  are  synthesized  in  top  level  output  ports. 
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Figure  34  NC  Matched  filter  subsystem  (one  of  two). 
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•  Accumulator,  Accumulator!  :  Implement  the  integration  coneept  in  the 
diserete  ease.  The  integration  must  be  over  one  bit  period,  thus  64 
eonsecutive  samples  from  the  mixers  must  be  added.  After  64  samples,  a 
reset  signal  is  expeeted  to  restart  the  same  process  for  the  next  bit.  The 
reset  must  be  synchronized  with  the  beginning  of  every  bit. 

Key  parameters: 

Operation:  add. 

Output  precision:  20  bits.  Given  that  each  input  from  the  mixers  cannot 
exceed  a  value  of  1  and  the  adder  adds  64  samples  and  assuming  all  values 
with  the  same  sign,  the  sum  cannot  exceed  64.  This  value  corresponds  to  6 
bits  for  the  integer  part  plus  one  for  the  sign.  Given  that  the  binary  point  is 
inferred  from  the  input  and  is  placed  at  the  10  position,  the  output  should 
be  at  most  17  bits  wide.  Some  extra  bits  are  given. 

•  FIF02,  FIF03:  description  of  the  block  as  in  Figure  31.  The  FIFOs  here 
are  used  as  a  convenient  way  to  capture  the  value  of  the  accumulators  just 
before  the  reset.  A  simple  register  with  an  enable  port  should  be  sufficient 
to  yield  the  same  result.  Read  enable  (re)  is  always  high. 

•  Delav6,  Delayll,  Delavl2,  DelaylS,  Delavl6  :  Delays  that  ensure  the 
timely  propagation  of  the  signals. 

•  Down  Sample  2,  Down  Sample  5:  This  block  reduces  the  sample  rate  of 
the  input,  discarding  the  extra  values  provided  in  the  highe  r  rate  input. 
The  capture  of  the  output  value  of  the  accumulator  by  the  FIFOs  is  made 
once  per  sixty- four  sample  periods.  Given  that  this  value  is  changing  once 
per  bit  period  (sixty-four  sample  periods)  there  is  no  need  for  the  blocks 
after  the  FIFOs  to  run  at  sample  period. 

Key  parameters: 

Sampling  rate  (number  of  input  samples  per  output  sample):  64*t. 
Switching  from  samPle  period  to  bit  period. 

Sample:  Last  value  of  the  frame.  This  choice  was  made  due  to  less 
hardware  needed  for  its  implementation.  This  choice  introduces  at  least 
one  latency. 

•  Mult4,  Mu  ItS:  This  block  multiplies  its  two  inputs.  In  this  case  the 
multiplication  simulates  the  squaring  operation  by  providing  the  same 
signal  to  both  inputs  of  the  block. 

Key  parameters: 

Output  type:  Unsigned  3 1  bits  with  binary  point  at  the  tenth  position.  The 
multiplication  may  double  the  bits  of  the  integer  part.  Given  that  six  bits 
were  calculated  to  be  sufficient  for  the  integer  part  after  the  accumulators, 
twelve  bits  are  now  needed  for  a  total  of  23  bits  with  the  sign.  Some  extra 
bits  are  offered. 

•  AddSub:  This  block  implements  the  addition  or  subtraction  operation.  In 
this  case,  the  addition  of  the  two  branches  must  be  made. 
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G.  CORRE  LATOR’S  SUBSYSTEM 


Figure  35  Correlator’s  Subsystem  (one  of  two). 


Initialization  block :  This  is  a  custom  block  that  resets  the  FIR  filters  at 
the  beginning  of  the  simulation  and  does  not  affect  the  circuit  anymore. 
This  need  appeared  after  switching  from  the  Xilinx  FIR  compiler  block  to 
the  custom  ‘FIR  filter’  block,  where  the  message  for  the  propagation  of 
indeterminate  values  appeared.  It  consists  of  a  constant,  a  delayed  constant 
and  a  comparator  as  shown  in  Figure  36.  The  ‘Relational’  finds  input  a 
higher  than  input  b  only  at  the  first  cycle  of  the  simulation  and  at  that 
instance  sends  a  reset  signal  to  the  ‘FIR  filter’  and  ‘FIR  filter  1.’  After  the 
first  cycle,  the  delayed  ‘Constant2’  render  to  input  ‘b’  a  value  that  is  equal 
to  input  ‘a’  of  the  ‘Relational,’  forcing  ‘reset  out’  to  go  low.  It  should  be 
noticed  that  all  constants  are  explicitly  sampled  in  64*t. 
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Figure  36  Initialization  block. 

•  FIR  filter,  FIR  filterl  :  custom  blocks  that  have  the  functionality  of  an 
accumulator  that  adds  the  last  64  values  of  its  input.  A  detailed  description 
is  given  in  Section  C.2  in  Chapter  V. 

•  Delayl,  Delav6  :  see  description  of  blocks  ‘Delay6,  Delayll,  Delayl2, 
DelaylS,  Delayl6’  of  Figure  34. 
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•  MultS,  Mult9:  see  description  of  blocks  ‘Mult4,  Mult5’  of  Figure  34. 

•  AddSub4:  see  description  of  block  ‘AddSub’  of  Figure  34. 

Key  parameters: 

Precision  (Output  Type):  Full. 

H,  DECISION  CIRCUIT 
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After  a  succesful  detection  ,  it  wait  for  the  rest 


Figure  37  Decision  Circuit. 


•  AddSub2:  block  implements  the  addition  or  subtraction  operation.  In  this 
case,  the  outputs  of  the  correlators  are  subtracted.  In  contrast  with  the 
‘Relational’  block  in  Figure  33,  not  only  the  highest  value,  but  also  the 
exact  value  is  needed.  The  result  is  supplied  to  the  following  ‘MCode’  to 
make  decisions  about  the  timing. 

Key  parameters: 

Output  type:  32  bits  with  the  Binary  point  in  the  tenth  position.  No  extra 
width  is  granted  compared  to  the  previous  block. 

•  Constant!,  Delayl:  Delayed  Enable  to  correct  initialization  problems. 

Key  parameters: 

Explicit  period:  t.  This  block  and  the  blocks  in  the  specific  subsystem  are 
running  at  the  sample  rate. 

•  Counters:  A  counter  should  be  thought  of  as  a  clock  with  an  adder.  Its 
output  can  be  usually  used  by  comparators  to  enable  or  disable  signals.  In 
this  case,  ‘MCode’  is  using  the  counter  to  time  stamp  incidents  of  interest. 
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Key  parameters: 

Counter  type:  Free  Running.  As  the  relative  occurrence  of  the  incidents  is 
of  interest,  any  reset  in  the  middle  of  a  preamble  acquisition  would  destroy 
the  synchronization  process.  A  reset  implemented  in  such  a  way  as  not  to 
disturb  this  acquisition  is  highly  recommended. 

Output  type:  Unsigned  with  18  bits.  A  long  width  was  chosen  to 
accommodate  the  concept  of  a  free  miming  counter  and  to  make  any 
restart  unlike. 

•  Explicit  period:  t.  This  block  and  the  blocks  in  the  specific  subsystem  are 
mnning  at  the  sample  rate.  The  result  of  the  subtraction  of  the  Correlators’ 
output  is  examined  at  the  sample  rate. 

•  MCodeO:  see  description  of  block  ‘MCodel’  of  Figure  32.  Here, 
‘MCodeO’  is  used  to  apply  some  countermeasure  against  noise.  A  typical 
maximum  for  the  input  waveform  is  at  a  value  around  700.  This  block 
makes  every  input  value  that  does  not  exceed  a  threshold  equal  to  zero.  In 
this  way,  small  amount  of  noise  will  not  be  perceived  as  signal  by  the 
timing  circuit,  trying  to  lock  at  random  noise  values.  Due  to  the  fact  that 
only  positive  values  can  trigger  a  synchronization  phase,  only  the  positive 
values  are  suppressed. 

Code: 

function  [di] 

if  d<50  &&  d>0 
di  =0; 

else 

di  =d; 

End 

•  MCode:  see  description  of  block  ‘MCodel’  of  Figure  32.  Here,  ‘MCode’ 
incorporates  the  logic  behind  the  bit  synchronization.  It  also  implements 
the  granular  packet  synchronization.  It  includes  a  state  machine  where 
each  state  represents  the  next  bit  of  the  preamble  expected  to  be  received. 
Explicitly,  the  zero  state  is  waiting  for  a  reception  of  a  1 ,  which  is  the  first 
bit  of  the  preamble.  The  state  one  is  trying  to  identify  a  0,  which  is  the 
second  bit  of  the  preamble  and  so  on.  The  state  machine  goes  up  to  the 
fifth  bit  of  the  preamble  and  after  that  it  locks  the  extracted 
synchronization  timing  value. 

The  input  waveform  is  maximized  at  the  exact  moment  that  all  64  samples 
of  a  1  have  been  accounted  for  by  the  accumulator  of  the  Correlator 
subsystem.  The  opposite  holds  for  the  Os,  where  the  waveform  is 
minimized.  The  ‘MCode’  tries  to  match  these  maxima  and  minima  to  the 
preamble  pattern.  These  maxima  and  minima  also  imply  the  end  of  a  bit. 
The  time  that  these  occur  help  achieve  the  synchronization  of  the  receiver. 


=  pre (d) 

%if  the  value  of  d  does  not  exceed  threshold... 
%... suppress  output 

%... otherwise,  let  input  pass. 
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The  decision  of  the  final  timing  of  each  packet  is  based  on  the  mean  value 
of  the  time  of  the  third  and  fourth  bit.  This  can  change  to  include  more 
bits. 


Code: 


function  [total_sync,  sync,  tsync,  reset_counter ]  = 
state  receiver (din, tin) 


persistent  state,  state  =  xl_state ( 0 , { xlUnsigned,  3,  0}); 
persistent  min,  min  =  xl  state ( 0 ,{ xlSigned,  32,  10}); 
persistent  max,  max  =  xl  state ( 0 ,{ xlSigned,  32,  10}); 

persistent  tsyncl,  tsyncl  =  xl_state ( 0 ,{ xlUnsigned,  18,  0});  %to  store 

%  time  stamp  of  the  acquisition  of  the  first  bit  of  the  preamble 
persistent  tsync2,  tsync2  =  xl_state ( 0 ,{ xlUnsigned,  18,  0}); 

persistent  tsync3,  tsync3  =  xl_state ( 0 ,{ xlUnsigned,  18,  0}); 

persistent  tsync4,  tsync4  =  xl_state ( 0 ,{ xlUnsigned,  18,  0}); 

persistent  tsyncS,  tsyncS  =  xl_state ( 0 ,{ xlUnsigned,  18,  0}); 


switch  state 
case  0 

if  (tin-tsyncl ) <64 


if  din  >=max 
max  =din; 
tsyncl  =tin; 
min  =max; 
tsync2  =tin; 

else 


%Search  for  the  first  bit  of  the  preamble. 
%For  each  max  value  found,  search  next  64 
%inputs  to  ensure  no  other  maximum  occurs. 
%If  other  maximum  found,  store  it... 

% . . . and  wait  again  64  samples  to  verify 
%this  is  the  only  maximum  for  the  time. 


if  din  <min  %Otherwise  see  if  it  is  minimum  to 
min  =din;  %initialize  correctly  state  1. 
tsync2  =tin; 

end 


end 

state  =0; 
sync  =  0; 
tsync  =0; 
else 

state  =1; 
sync  =  1; 


tsync  =tsyncl; 
max  =min; 


%When  no  other  maximum  found  in  the  given... 
%...time  frame,  go  to  next  state. 

%sync  high  means  that  this  tsync  is  going 
%actually  to  be  used  to  extract  timing 
%information .  Otherwise  the  value  of  tsync 
%is  ignored. 

%Give  tsync  to  output. 


end 

total  sync  =xf ix ( { xlBoolean } , 0 ) ;  %Enabled  when  the  fifth  bit  of 

%the  preamble  is  located 

reset  counter  =xf ix ({ xlBoolean }, 0 ) ;  %Not  allow  reset  for  the 

%external  counter  (not  used) 

case  1  %Search  for  the  second  bit  of  the  preamble, 

if  (tin-tsync2 ) <64  %For  each  min  value  found,  search  next  64 

%inputs  to  ensure  no  other  minimum  occurs. 
%If  other  minimum  found,  store  it... 


if  din  <=min 
min  =din; 
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%Otherwise  see  if  it  is  maximum  to 
%initialize  correctly  state  2. 


tsync2  =tin; 
max  =min; 
tsync3  =tin; 

else 

if  din  >max 
max  =din; 
tsync3  =tin; 

end 

end 

state  =1; 
sync  =  0; 
tsync  =tsyncl; 

else  %When  no  other  maximum  found  in  the  given... 

state  =2;  %...time  frame,  go  to  next  state, 

sync  =  0;  %sync  low  means  that  this  tsync  is  not  going 

%to  be  used  to  extract  timing  information  and 
%tsync  will  be  ignored, 
tsync  =tsync2 ; %Give  tsync  to  output, 
min  =max; 

end 

total_sync  =xfix ( {xlBoolean} , 0) ; 
reset_counter  =xfix ( {xlBoolean} , 0) ; 
case  2  %Search  for  the  third  bit  of  the  preamble, 

if  (tin-tsync3) <64  %and  go  through  the  procedure  of  state  0 

if  din  >=max 
max  =din; 
tsync3  =tin; 
min  =max; 
tsync4  =tin; 

else 

if  din  <min 
min  =din; 
tsync4  =tin; 

end 

end 

state  =2; 
sync  =  0; 
tsync  =tsync2; 

else 

sync  =  1; 
state  =3; 
tsync  =tsync3; 
max  =min; 

end 

total_sync  =xfix ( {xlBoolean} , 0) ; 
reset_counter  =xfix ( {xlBoolean} , 0) ; 
case  3  %Search  for  the  forth  bit  of  the  preamble, 

if  (tin-tsync4 ) <64  %and  go  through  the  procedure  of  state  1 
if  din  <=min 
min  =din; 
tsync4  =tin; 
max  =min; 
tsyncS  =tin; 

else 

if  din  >max 
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max  =din; 
tsyncS  =tin; 


end 

end 

state  =3; 
sync  =  0; 
tsync  =tsync3; 

else 

sync  =  1; 
state  =4; 

tsync  =xf ix ( { xlUnsigned,  18,  0}, (tsync3+tsync4 ) /2 ) ;  %  This 
%  criteria  was  chosen.  Different  combinations  of  averaging 
%  are  also  possible, 
min  =max; 

end 

total_sync  =xfix ( {xlBoolean} , 0) ; 
reset_counter  =xfix ( {xlBoolean} , 0) ; 
case  4  %Search  for  the  fifth  bit  of  the  preamble, 

if  (tin-tsync5) <64  %and  go  through  the  procedure  of  state  0 
if  din  >=max 
max  =din; 
tsyncS  =tin; 

%no  reset  for  the  next  step 
%else 

%no  store  of  min  for  the  next  step 

end 

state  =4; 
sync  =  0; 

tsync  =xfix ({ xlUnsigned,  18,  0}, (tsync3+tsync4 ) /2 ) ;  %  This 
%  criteria  was  chosen.  Different  combinations  of  averaging 
%  are  also  possible. 

else 

sync  =  0; 
state  =5; 
tsync  =tsync5; 
max  =  min; 

end 

total_sync  =xfix ( {xlBoolean} , 0) ; 
reset_counter  =xfix ( {xlBoolean} , 0) ; 
case  5  %Stay  locked  waiting  for  the  whole  packet 

%to  finish. 

if  (tin-tsync5) <  7872-12%The  time  to  complete  the  reception  of 

%128  bits  given  that  tsyncS  corresponds 
%to  the  5th  bit  of  the  packet, 
total  sync  =xf ix ({ xlBoolean },  1 )  ;  %The  preamble  (up  to  fifth 

%bit)has  been  successfully 
%located . 

state  =5; 

sync  =0;  %Lock  the  timing  information. 

reset_counter  =xfix ( {xlBoolean} , 0) ; 
else  %Preparation  to  start  over, 

if  (tin-tsync5 ) <  7872+56 

total_sync  =xfix ( {xlBoolean} , 0) ; 
reset  counter  =xf ix ({ xlBoolean }, 0 ) ; 
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state  =5; 
sync  =0; 

else  %Start  over  with  the  following  parameters: 

reset  counter  =xf ix ( { xlBoolean } , 1 ) ; 
state  =0; 

total_sync  =xfix ( {xlBoolean} , 0) ; 
sync  =  0; 
max  =0; 
min  =0; 

end 

end 

tsync  =tsync5; 
tsyncl  =tin; 

otherwise  %escape  state  from  unexpected  condition, 

state  =  0; 
sync  =  0; 
tsync  =0; 

total_sync  =xfix ( {xlBoolean} ,  0)  ; 
reset_counter  =xfix ( {xlBoolean} , 0) ; 

end 

•  Delav9:  see  deseription  of  blocks  ‘Delay?,  Delayl’  of  Figure  30. 

•  AddSubS,  Constant?  :  There  is  an  inherent  delay  between  the 
unprocessed  samples  at  the  input  of  the  Correlators  and  the  point  where 
the  timing  decision  is  taken.  To  offset  this  fact  a  constant  value  is  added  to 
the  synchronization  time  that  has  been  calculated  by  ‘MCode.’  The  exact 
value  of ‘Constant?’  is  calculated  experimentally. 

Key  parameters: 

Operation;  addition. 

Constant  value;  23.  Experimentally  calculated  value. 

•  Slice;  This  block  extracts  from  the  input  only  a  specified  portion  of  the 
word  and  presents  it  to  the  output.  The  operation  implemented  here  is 
modulo  64.  The  remainder  of  the  input  value  when  divided  by  64  is  the  six 
least  significant  bits.  The  timing  information  is  supplied  to  a  counter  that 
counts  up  to  63.  This  is  the  reason  that  the  absolute  timing  information 
must  be  translated  to  time  modulo  64. 

Key  parameters: 

Width  of  slice  (number  of  bits);  6. 

Specify  range  as;  Lower  bit  location  +width. 

Relative  to;  LSB  of  input. 

•  Delay  5 ;  Delays  that  ensure  the  timely  propagation  of  the  signals.  It  is 
common  practice  to  have  to  insert  delays  between  adjacent  blocks. 

•  Delay  10;  delay  that  had  been  used  for  synchronization  purposes. 
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•  Convertl:  Translate  the  input  to  a  desired  output  type.  Enable  and  reset 
inputs  ean  only  be  driven  by  Boolean  signals.  In  this  ease  ‘sune’  signal  is 
unsigned  one  bit  integer  and  must  be  eonverted  to  Boolean.  This  bloek 
may  not  even  require  resourees  when  mapped  to  the  FPGA,  depending  on 
some  parameters  ehosen. 

Key  parameters: 

Type  (Output  preeision):  Boolean. 

•  Register:  is  a  D  flip-flop  with  lateney  equal  to  one  sample  period.  It 
provides  an  optional  enabled  port.  When  this  port  is  used  the  register  does 
not  aeeept  any  new  value  from  its  input  and  eontinues  to  have  the  same 
output. 

Key  parameters: 

Optional  Ports:  Provide  enable  port.  Here  the  enable  is  the  ‘syne’  output  of 
the  ‘MCode’  whieh  means  that  a  desired  bit  of  the  preamble  has  been 
deteeted. 

•  Counter:  A  eounter  should  be  thought  of  as  a  cloek  with  an  adder.  Its 
output  ean  be  usually  used  by  eomparators  to  enable  or  disable  signals.  In 
this  ease,  the  eounter  defines  a  64  time  eyele,  within  whieh  the 
aeeumulators  of  the  NC  Matehed  Filter  subsystems  must  be  reset  exaetly 
onee.  The  speeifie  instanee  of  the  reset  is  defined  by  the  value  stored  to 
the  ‘Register’  and  the  reset  signal  is  ereated  by  the  eomparison  of  the 
value  of  the  eounter  with  the  value  of  the  output  of  the  ‘Register.’ 

Key  parameters: 

Counter  type:  Count  Fimited. 

Count  to  value:  63.  This  defines  the  64  time  eyele. 

Number  of  bits:  6.  To  aeeommodate  eounting  up  to  63. 

Explieit  period:  1.  As  the  other  eomponents  in  this  subsystem,  the 
‘Counter’  works  at  the  sample  rate. 

•  Relational!:  is  a  eomparator  that  can  support  a  plethora  of  different 
comparisons.  Here  the  comparison  is  made  between  the  ‘Counter’  and  the 
output  of  the  ‘Register’  to  define  at  which  exact  time  instance  the 
accumulators  of  the  NC  Matched  Filter  subsystems  must  be  reset. 
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I.  DECODING 


SUBSYSTEM 


To  Wor1<space  1 


Figure  38  Decoding  Subsystem. 


•  Delavl9,  Delav20  :  Delays  that  had  been  used  for  synchronization 
purposes,  to  synchronize  ‘preamble  end’  and  ‘channel  bits’  signals.  Their 
values  were  found  experimentally. 

Key  parameters: 

Latency:  5  and  6,  respectively. 

•  Down  Sample:  This  block  reduces  the  sample  rate  of  the  input,  discarding 
the  extra  values  provided  in  the  higher  rate  input.  Flere,  it  matches  the 
sampling  rate  of  ‘preamble  end’  signal  with  that  of  ‘channel  bits.’ 

•  Counter  1:  see  description  of  block  ‘Counters’  of  Figure  37.  The  only 
different  key  parameter  is  the  following: 

Key  parameters: 

Explicit  period:  64*t.  This  block  and  the  blocks  in  the  specific  subsystem 
are  running  at  the  bit  rate.  All  input  signals  have  been  downsampled  by  a 
factor  of  64. 

•  MCode:  see  description  of  block  ‘MCodel’  of  Figure  32.  Here,  ‘MCode’ 
makes  the  fine  tuning  of  the  packet  synchronization.  After  receiving 
‘preamble  end’  high  from  the  ‘Timing  Circuit,’  it  checks  the  input  bits  to 
locate  the  first  1.  This  should  be  the  last  bit  of  the  preamble. 
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Code: 


function  we  =  preamble  detacher (clock, prbl  end, input  bit) 

persistent  state,  state  =  xl_state ( 0 , { xlUnsigned,  1,  0}); 
persistent  counter,  counter  =  xl_state ( 0 ,{ xlUnsigned,  16,  0 } ) ; 
persistent  delay,  delay  =  xl  state (0,{xlBoolean}); 


switch  state 

case  0  %Wait  state. 

if  prbl  end  ==  xf ix ( { xlBoolean } , 1 ) ; %When  timing  circuit  locates 

%the  5th  bit  of  the  preamble, 
if  delay  ==xf ix ({ xlBoolean }, 0 )  %let  one  cycle  to  pass 
delay  =xf ix ({ xlBoolean }, 1 ) ; 

else 


%and  search  for  an  input  bit  =1 
%to  go  to  the  next  state. 


%While  to  find  stay  at  the 
%current  state... 


if  input  bit  ==  1 
state  =  1; 
counter  =clock; 

else 

state  =0; 

end 

end 

else 

state  =  0; 
counter  =0; 

end 

we  =  xf ix ({ xlBoolean }, 0 ) ; 
case  1 

if  clock-counter  <120 
state  =  1; 

else 

if  prbl  end  ==  xf ix ({ xlBoolean },  1 ); %  Verify  that  'preamble 

%end'  signal  when  low 

state  =1; 

else 

state  =  0;  %and  go  to  the  wait  state' 

counter  =0; 

delay  =xf ix ({ xlBoolean }, 0 ) ;  %resetting  the  flag. 

end 

end 

we  =  true; 
otherwise 

state  =  0; 

we  =  xf ix ({ xlBoolean }, 1 ) ; 
delay  =xf ix ({ xlBoolean }, 0 ) ; 


%and  output  write  enable  low. 
%Packet  under  reception. 
%Until  it  counts  120  bits, 
%stay  in  the  current  state. 


%escape  state  from  unexpected  condition. 


end 


Convert:  Translate  the  input  to  a  desired  output  type.  The  output  of  the 
‘Relational’  block  is  Boolean  and  must  be  translated  to  an  unsigned  one 
bit  integer  in  order  to  drive  the  next  blocks. 

FIF04:  It  is  a  First  In  First  Out  memory  queue.  The  input  values  engage 
the  next  available  memory  location  in  the  memory  queue.  This  function  is 
permitted  whenever  the  write  enable  (we)  signal  is  high;  otherwise,  the 
input  data  is  discarded.  In  this  case,  ‘we’  is  driven  by  the  ‘we’  output  of 
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‘MCode.’  The  read  enable  (re)  signal  is  a  delayed  high.  This  allows  the 
eneoded  bits  to  appear  at  the  input  of  the  Modulation  Subsystem.  Outputs 
‘empty,’  ‘%fulT  and  ‘full’  are  not  used  and  are  terminated  by  ‘T,’‘T  1,’‘T 
2’  bloeks. 

Key  parameters: 

Depth:  256. 

•  Constants,  Dealv4:  The  input  sequenee  of  bits  is  stored  but  it  is  read  with 
delay  in  order  to  aoeommodate  the  preamble  bits  that  are  sereened  from 
the  stored  sequenee.  For  a  larger  number  of  received  packets,  the  value  of 
‘Delay4’  should  be  increased.  Any  failure  of  the  delay  to  account  for  all 
the  preambles  taken  away  will  distort  the  output  sequence  by  creating 
copies  of  the  previous  bit  to  the  output  in  order  to  fulfdl  the  time  gaps. 
Another  solution  would  be  to  place  the  decoder  with  an  enable  at  that 
point.  The  enable  could  be  driven  by  the  write  enable  (we)  output  of 
‘MCode’  and  whenever  it  was  low,  it  would  freeze  the  Viterbi  Decoder 
until  the  next  enable  high. 

Key  parameters: 

Constant  value:  1 . 

Delay  value  (Latency):  100.  This  number  could  be  lower  for  fewer 
received  packets  or  higher  for  more  received  packets. 

•  T,  Tl,  T2,  T3  :  terminate  their  inputs  to  avoid  warning  messages.  This 
means  that  their  inputs  are  not  useful  in  this  design. 

•  Time  Division  Demultiplexer  :  This  block  breaks  the  input  stream  to 
multiple  output  streams  according  to  the  sampling  pattern  specified.  The 
outputs  are  downsampled  compared  to  the  input. 

Key  parameters: 

Frame  sampling  pattern:  [1  1].  Every  second  input  bit  is  presented  to  the 
same  output. 

Implementation:  Multiple  Channel. 

•  Viterbi  Decoder  v  6  0  :  This  decoder  decodes  convolutionally  encoded 
data.  The  block  has  a  green  color  to  emphasize  the  fact  that  an  extra 
license  is  needed  to  use  it.  For  the  purpose  of  this  project,  a  90  days  free 
license  was  granted  by  the  online  site  of  Xilinx.  The  same  parameters  used 
in  the  Convolutional  Encoder  block  must  be  specified  here  as  well.  Extra 
capabilities  like  soft  decision  decoding  and  puncturing  are  offered  but 
were  not  used. 

Key  parameters: 

Constraint  length:  3. 

Convolutional  code  array  1  (octal): [7  5]. 

Coding:  Hard 
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•  Constant:  Drives  the  ‘vin’  of  ‘Viterbi  Deeoder  v6_0’  always  high. 

•  Sample  Time2,  Displav2  :  See  deseription  of  bloek  ‘Sample  Timel’  of 
Figure  28 

•  To  Workspace,  To  Workspacel :  see  deseription  of  bloek  ‘Workspaeel’ 
of  Figure  30.  These  are  eustom  bloeks  to  inelude  both  ‘Gateway  Out’  and 
‘To  Workspaee’  to  fit  easier  in  the  design. 

Key  parameters: 

Variable  name:  pre_Viterbi_deeoder,  after_Viterbi_deeoder,  respeetively. 
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APPENDIX  C.  MATLAB  VERIFICATION  CODE 


The  Matlab  code  written  to  verify  the  decision  signal  shown  in  ‘ScopeS’  of  Figure 
37  as  the  input  of  the  ‘MCode’  block  is  given  in  this  chapter.  The  decision  circuit  is  the 
most  critical  part  of  the  receiver,  thus,  a  reproduction  of  the  simulation  results  of  the 
Sysgen  was  important.  This  code  helped  locate  the  problem  related  to  the  Digital  Discrete 
Synthesizers  mentioned  in  Section  B.2,  Chapter  V. 

clear  din 

tstep  =10*10^-9;  %define  the  simulation  step, 

simulation  length  =300000;  %define  the  length  of  the  simulation, 
fl  =45*10^6; 
f2  =40*10^6; 

time  =0 : tstep : simulation  length*tstep; 

input  =simout . signals . values ( 1 : simulation  length+1)'; 
h  =ones (1,64); 

X  sin  =input . * ( . 5*sin (2*pi*f l*time) ) ; 

X  cos  =input . * ( . 5*cos (2*pi*f l*time) ) ; 

%  x_sin  = [xsin . signals . values ]; %  In  case  the  output  of  the  Sysgen  ... 

%  X  cos  = [xcos . signals . values ]; %  after  the  mixers  is  used. 

X  branch  =conv(x  sin,h)  .^2+conv(x  cos,h)  .''2; 

y  sin  =input . * ( . 5*sin (2*pi*f2*time) ) ; 
y  cos  =input . * ( . 5*cos (2*pi*f2*time) )  ; 

%  y_sin  = [ysin. signals .values] ; %  In  case  the  output  of  the  Sysgen  ... 

%  y_cos  = [ycos . signals .values] ; %  after  the  mixers  is  used. 

y  branch  =conv(y  sin,h)  .^2+conv(y  cos,h)  .''2; 

din  =x  branch-y  branch; 

din  =  (din>50  |  din<0) .*din; 

din  =[ zeros ( 75 , 1 ) ;  din'];  %insert  a  small  delay  to  match  the  Sysgen 
output . 

%din  =[ zeros ( 108 , 1 ) ;  din']; 
figure  ( 1 ) 

subplot (2, 1, 1) ,plot (din) 
title (' Decision  Signal  ') 
xlabel ( ' time (sec) ' ) 
ylabel ( ' amplitude ' ) 
xlim([0  simulation  length]) 
state  =0; 

max  =0;min  =0 ; tsync_plot  =0; 
k  =0;%point  counter  of  tsync 
kk  =0;%point  counter  of  tsync_plot 

tsyncl  =150;tsync2  =150;tsync3  =150;tsync4  =150;tsync5  =150; %Initialize 

tsync  =zeros(l,6); 

for  tin  =1 8 1 : length (din) 

switch  state 

case  0  %  Search  for  the  first  bit  of  the  preamble. 

if  (tin-tsyncl ) <64%For  each  max  value  found,  search  next  64 

%inputs  to  ensure  no  other  maximum  occurs, 
if  din (tin)  >=max%If  other  maximum  found,  store  it... 
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max  =din (tin) ; 

tsyncl  =tin;%...and  wait  again  64  samples  to  verify 
min  =max;  %this  is  the  only  maximum  for  the  time. 
tsync2  =tin; 

else 

if  din (tin)  <min%Otherwise  see  if  it  is  minimum  to 
min  =din (tin) ; %initialize  correctly  state  1. 
tsync2  =tin; 

end 

end 

state  =0; 
sync  =  0; 
tsync ( 1+k)  =0 ; 

else%When  no  other  maximum  found  in  the  given... 
state  =l;%...time  frame,  go  to  next  state, 
sync  =  l;%sync  high  means  that  this  tsync  is  going 
%actually  to  be  used  to  extract  timing 
%inf ormation .  Otherwise  the  value  of  tsync  is 
%ignored . 

tsync_plot ( 1+kk)  =tsyncl; 

tsync (1+k)  =tsyncl;  %Give  tsync  to  output, 
max  =min; 

end 

total  sync  =0;%Enabled  when  the  fifth  bit  of  the 
%preamble  is  located 

case  l%Search  for  the  second  bit  of  the  preamble. 

if  (tin-tsync2 ) <64%For  each  min  value  found,  search  next  64 

%inputs  to  ensure  no  other  minimum  occurs, 
if  din (tin)  <=min%If  other  minimum  found,  store  it... 
min  =din (tin) ; 
tsync2  =tin; 
max  =min; 
tsync3  =tin; 

else 

if  din (tin)  >max%Otherwise  see  if  it  is  maximum  to 
max  =din (tin) ; %initialize  correctly  state  2. 
tsync3  =tin; 

end 

end 

state  =1; 
sync  =  0; 

tsync (2+k)  =tsyncl; 

else  %When  no  other  maximum  found  in  the  given... 

state  =2;%...time  frame,  go  to  next  state, 
sync  =  l;%sync  high  means  that  this  tsync  is  going 
%actually  to  be  used  to  extract  timing 
%inf ormation .  Otherwise  the  value  of  tsync  is 
%ignored . 

tsync_plot (2+kk)  =tsync2; 

tsync (2+k)  =tsync2 ; %Give  tsync  to  output, 
min  =max; 

end 

total_sync  =0; 

case  2%Search  for  the  third  bit  of  the  preamble. 

if  (tin-tsync3) <64%and  go  through  the  procedure  of  state  0 
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if  din (tin)  >=max 
max  =din (tin) ; 
tsync3  =tin; 
min  =max; 
tsync4  =tin; 

else 

if  din (tin)  <min 
min  =din (tin) ; 
tsync4  =tin; 

end 


end 

state  =2; 
sync  =  0; 

tsync(3  +  k;)  =tsync2; 

else 

sync  =  l;%sync  low  means  that  this  tsync  is  not  going 
%to  be  used  to  extract  timing  information  and 
%tsync  will  be  ignored. 
tsync_plot ( 3+kk)  =tsync3; 
state  =3; 

tsync (3+k)  =tsync3; 
max  =min; 

end 

total_sync  =0; 

case  3%Search  for  the  forth  bit  of  the  preamble. 

if  (tin-tsync4 ) <64%and  go  through  the  procedure  of  state  1 
if  din (tin)  <=min 
min  =din (tin) ; 
tsync4  =tin; 
max  =min; 
tsyncS  =tin; 

else 


if  din (tin)  >max 
max  =din (tin) ; 
tsyncS  =tin; 

end 

end 

state  =3; 
sync  =  0; 

tsync (4+k)  =tsync3; 

else 

sync  =  1; 
state  =4; 

%Choose  a  criterion  for  timing 

tsync  plot(4+kk)  =tsync4;  %  Store  time  that  the  decision 

%  taken 

tsync (4+k)  = (tsync4+tsyncl ) /2 ;  %  Timing  decision 
min  =max; 

end 

total_sync  =0; 

case  4%Search  for  the  fifth  bit  of  the  preamble. 

if  (tin-tsync5) <64%and  go  through  the  procedure  of  state  0 
if  din (tin)  >=max 
max  =din (tin) ; 
tsyncS  =tin; 
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%no  reset  for  the  next  step 
%else 

%no  store  of  min  for  the  next  step 

end 

state  =4; 
sync  =  0; 

%The  criterion  chosen  in  case  3 
tsync(5+k)  = (tsync4+tsyncl ) /2 ; 

else 

sync  =  1; 
state  =5; 

%Choose  a  criterion  for  final  time 

tsync  plot(5+kk)  =tsync5;  %  Store  time  that  the  decision 

%  is  taken 

tsync (5+k)  = (tsync3+tsync4 ) /2 ;  %  Timing  decision 
max  =  min; 

end 

total_sync  =0; 

case  5  %Stay  locked  waiting  for  the  whole  packet 

%to  finish. 

if  (tin-tsync5) <  7872-12%the  time  to  complete  the  reception  of 

%128  bits  given  that  tsyncS  corresponds  to 
%the  5th  bit  of  the  packet, 
total  sync  =l;%The  preamble  (up  to  the  fifth 

%bit)has  been  successfully  located. 

state  =5; 

sync  =  0;  %Lock  the  timing  information, 
else  %Preparation  to  start  over, 
if  (tin-tsync5 ) <  7872+32 
total_sync  =0; 
state  =5; 
sync  =0; 

else  %Start  over  with  the  following  parameters: 
state  =0; 
total_sync  =0; 
sync  =  0; 
max  =0; 
min  =0; 

k  =k+6;%point  counter  of  tsync 
kk  =kk+6; %point  counter  of  tsync  plot 
%tsync(6+k)  =tsync5; 

%tsyncl  =tin; 

end 

tsync (6+k)  =tsync5; 
tsyncl  =tin; 

end 

otherwise%escape  state  from  unexpected  condition, 
state  =  0; 
sync  =  0; 
tsync ( k)  =0 ; 
total_sync  =0; 

end 

end 

subplot (2, 1,2), plot (tsync_plot, ones ( 1 , length (tsync_plot) 
title (' Preamble  Detection  Signal (tsync) ') 
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xlabel ( ' time (sec) ' ) 
ylabel ( ' amplitude ' ) 
xlim([0  simulation  length]) 
ylim([0  1.5]) 

tsync_final  =mod (tsync+23,  64)  ; 

figure (2 ) 

plot (tsync_final) 
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